When viewing most youtube videos you can rate the video as in the following example.
The overall video rating, I assume, is the mean of the ratings, that is
A problem with this rating technique is sensitivity to outliers e.g. people voting 0 or 5. One solution is for youtube to use the median instead of the mean. The median can be computed in linear time but the computation is significant, see selection algorithms.
But would using the median actually change the overall video rating? We need to analyze how people rate videos. Is someone’s rating of a video independent of the previous ratings?
It is possible that people will give a high rating if they believe the video has been rated too low and vice versa. In this scenario the mean is the same as the median!
It is also possible that people rate a video independent of the previous ratings in which case using the median is desirable.
To distinguish between which of these scenarios is more correct we can give the following two tests.
1. Use a common video in both tests and a large random set of users. This is so we can compare the tests against each other.
Test1: The video rating is given by the mean of the ratings and a user can see the video rating before they rate it.
Test2: The user cannot see the video rating before they rate the video and again the video rating is given by the mean of the ratings.
If the overall video rating is the same in both tests then it strongly suggests that users rate videos independent of the previous ratings. Of course if the distribution of user ratings is normal than the mean and median will be the same the same in Test1 and Test2. One way to modify the experiment if this happens is to add a few outlier ratings to Test1 and then see if the users correct the rating to what it would be without the outliers.
Now if the video rating in Test1 agrees with the median of the video ratings in Test2, then we know that using the mean is a good substitute for the median.