What is the best method for ranking items that have positive and negative reviews? Some sites, including reddit, have adopted an algorithm suggested by Evan Miller to generate their item rankings. However, this algorithm can sometimes be unfairly pessimistic about new, good items. This is especially true of items whose first few votes are negative — an issue that can be “gamed” by adversaries. In this post, we consider three alternative ranking methods that can enable high-quality items to more-easily bubble-up. The last is the simplest, but continues to give good results: One simply seeds each item’s vote count with a suitable fixed number of hidden “starter” votes.

(more…)

# Statistics

## Multivariate Cramer-Rao inequality

The Cramer-Rao inequality addresses the question of how accurately one can estimate a set of parameters $\vec{\theta} = \{\theta_1, \theta_2, \ldots, \theta_m \}$ characterizing a probability distribution $P(x) \equiv P(x; \vec{\theta})$, given only some samples $\{x_1, \ldots, x_n\}$ taken from $P$. Specifically, the inequality provides a rigorous lower bound on the covariance matrix of any unbiased set of estimators to these $\{\theta_i\}$ values. In this post, we review the general, multivariate form of the inequality, including its significance and proof.

(more…)

## Mathematics of measles

Here, we introduce — and outline a solution to — a generalized SIR model for infectious disease. This is referenced in our following post on measles and vaccination rates. Our generalized SIR model differs from the original SIR model of Kermack and McKendrick in that we allow for two susceptible sub-populations, one vaccinated against disease and one not. We conclude by presenting some python code that integrates the equations numerically. An example solution obtained using this code is given below.

(more…)