Probably not what you're looking for, but possibly related: our mutual acquaintance Leonard Richardson built a web-page recommendation engine called the Ultra Gleeper, which deliberately discards the most-linked-to things in your circle of awareness via an algorithm he calls "the Indie Rock Peter principle." There's more about it in the intro paper on that page.
I remember when he announced it! The Indie Rock Pete principle is indeed a very solid one.
RYAN. Remember when I went on a long rant about RSS feeds and suggestion engines and stuff? THIS WAS WHAT I WANTED.
I can't even remember what I dreamed and what happened but I'm glad I pointed you in the right direction eventually!
Is this the article you're looking for? I also found it very enlightening.
YES! It is almost literally a dream come true. Thank you!
There's an easier way...
Suppose you have the Amazon system of scoring things from 1-5. Start everything with a single vote of 3.
Then suppose you have two products, one with five 5s and one with fifty 5s. The second one will rank higher because the fifty votes skew it closer to a perfect 5. A single 5 vote gives the product an average score of 4.
I use this system on my own site.
Essentially what you're doing is a very weak Bayesian average. What you're doing is assuming that any item will get 1 vote on average, and that all the votes on all items average to 3. Which will get you some of the benefits of a Bayesian average (ie that a rating needs to prove it's credibility by having large numbers of votes away from the average), but you strengthen it by adding more samples... making it harder to push the rating off of the average. For the proper Bayesian average, the number of samples added should be the mean of the number of votes that an item receives, and the value of those extra samples should be the actual mean of all votes received on all items.
Oh, cool. I've heard "Bayesian average" tossed around before and I've even visited sites that use it (boardgamegeek), but I'd never bothered to look up the specifics.
I kind of feel proud about coming up with a poor man's version independently.
Umm... I'm not a Statistician, but it seems to me like the obvious measure you want to look at is the expected value of the Bernoulli parameter (p in the link) given the known information (namely numbers of upvotes and downvotes); not this Wilson what-a-ma-jig, whatever it is. A simple Bayesian analysis gives that it is given by:
E(p)= integral from 0 to 1 of [P(p) p^(n_u+1) (1-p)^(n_d) dp] / integral from 0 to 1 of [P(p) p^(n_u) (1-p)^(n_d) dp]
Where P(p) is a prior probability distribution on the Bernoulli parameters, which you can approximate presumably from knowing the scores of other items on your site that have lots of votes; n_u is number of upvotes; n_d is number of downvotes. Instead of doing integrals you can change your models to a finite number of possible Bernoulli parameters and get sums instead.
For example, given a uniform prior distribution between 0 and 1, an item with 2 upvotes and 1 downvote will have E(p)=3/5. 1 upvote, no downvotes would give E(p)=2/3. Actually, for a uniform prior, this gives exactly the same result as starting off each item with one upvote and one downvote, which is kind of what moogle suggested.
Was John Leguizamo in this dream? Did he betray you? **(Deleted comment)**
He will betray you, Naseem.
Wellity, well well... Look who has been unbanned from this blog!
I am all the time stumbling upon hand-written websites on loose-leaf paper.
Guess what I did! I finally posted day one of my hourlies that I did a month ago on my LJ account! They are awful! HOORAY!!
I approve and don't think they're awful! |