# Statistically Accurate Ratings

In a previous post on ratings I noted some issues with using the mean vs the media for the rating. A few days ago Jeff Atwood posted on user ratings and specifically how to sort a set of items that are rated by users. Jeff’s post included material from Evan Miller’s article, How Not To Sort By Average Rating, on the same subject.

Below is the relevant portion of Evan Miller’s article:

CORRECT SOLUTION: Score = Lower bound of Wilson score confidence interval for a Bernoulli parameter

Say what: We need to balance the proportion of positive ratings with the uncertainty of a small number of observations. Fortunately, the math for this was worked out in 1927 by Edwin B. Wilson. What we want to ask is: Given the ratings I have, there is a 95% chance that the “real” fraction of positive ratings is at least what? Wilson gives the answer. Considering only positive and negative ratings (i.e. not a 5-star scale), the lower bound on the proportion of positive ratings is given by:

(For a lower bound use minus where it says plus/minus.) Here p is the observed fraction of positive ratings, zα/2 is the (1-α/2) quantile of the standard normal distribution, and n is the total number of ratings. The same formula implemented in Ruby:

require ‘statistics2’

def ci_lower_bound(pos, n, power)
if n == 0
return 0
end
z = Statistics2.pnormaldist(1-power/2)
phat = 1.0*pos/n
(phat + z*z/(2*n) – z * Math.sqrt((phat*(1-phat)+z*z/(4*n))/n))/(1+z*z/n)
end

pos is the number of positive rating, n is the total number of ratings, and power refers to the statistical power: pick 0.10 to have a 95% chance that your lower bound is correct, 0.05 to have a 97.5% chance, etc.

Now for any item that has a bunch of positive and negative ratings, use that function to arrive at a score appropriate for sorting on, and be confident that you are using a good algorithm for doing so.

Sadly everybody has simply quoted the mathematical forumla and not even given links to the material on how to derive the formula. I was able to find an article by Keith Dunnigan here that gave an outline of how it is derived along with several other confidence intervals. Hopefully later I can take a look at my textbooks and do a full derivation.