|DID IT HAPPEN OR WAS IT HEADY FANTASY??
||[Feb. 17th, 2009|02:42 pm]
A few days ago I am ALMOST CERTAIN I read an article about ratings for websites, and the proper way to calculate your top lists with them: "highest rated items", things like that. They're tricky, because when you show something like "Top 10 things", these lists tend to be static. That's why most websites have an "in the past 7 days" rider, the article said, to shake things up, because once something enters that list it tends to stay. And you want to rate an item with ten 5 star ratings and one 1 star rating higher than something with ten 3 1/2 star ratings. Right?|
The article also specifically mentioned Amazon.com and another site as doing this sort of thing wrong, and showed examples from their sites. The trick was the way you'd expect to do it (the naive way) has some hidden flaws. The article then gave a better way of doing it involving some complicated math and some simple code. It was in a trendy language. Ruby? Python?
Anyway I'm pretty sure I skimmed the article, but I can't find it. And I had a dream where my friend Pat and I were looking for the article, but all we found was a hand-written version of it (?) and it was missing some pages. When I woke up I wasn't sure if the article had ever existed in the first place, or if I had dreamed it! I'm pretty sure I skimmed it and THEN dreamed about it, but any memory I have of skimming a random website is so flimsy that it may have been part of the dream too.
So if you came across this article please let me know! I have not been confused as to whether something happened or I dreamed it in at least a decade, and I wish it was over something way cooler than a website showing you a better way to generate a list on a computer.
UPDATE: Here it is! gregstoll shares my dreams.
Probably not what you're looking for, but possibly related: our mutual acquaintance Leonard Richardson built a web-page recommendation engine called the Ultra Gleeper
, which deliberately discards the most-linked-to things in your circle of awareness via an algorithm he calls "the Indie Rock Peter principle." There's more about it in the intro paper on that page.
I remember when he announced it! The Indie Rock Pete principle is indeed a very solid one.
RYAN. Remember when I went on a long rant about RSS feeds and suggestion engines and stuff? THIS WAS WHAT I WANTED.
I can't even remember what I dreamed and what happened but I'm glad I pointed you in the right direction eventually!
the article you're looking for? I also found it very enlightening.
YES! It is almost literally a dream come true. Thank you!
There's an easier way...
Suppose you have the Amazon system of scoring things from 1-5. Start everything with a single vote of 3.
Then suppose you have two products, one with five 5s and one with fifty 5s. The second one will rank higher because the fifty votes skew it closer to a perfect 5. A single 5 vote gives the product an average score of 4.
I use this system on my own site.
Essentially what you're doing is a very weak Bayesian average. What you're doing is assuming that any item will get 1 vote on average, and that all the votes on all items average to 3. Which will get you some of the benefits of a Bayesian average (ie that a rating needs to prove it's credibility by having large numbers of votes away from the average), but you strengthen it by adding more samples... making it harder to push the rating off of the average. For the proper Bayesian average, the number of samples added should be the mean of the number of votes that an item receives, and the value of those extra samples should be the actual mean of all votes received on all items.
Oh, cool. I've heard "Bayesian average" tossed around before and I've even visited sites that use it (boardgamegeek), but I'd never bothered to look up the specifics.
I kind of feel proud about coming up with a poor man's version independently.
Umm... I'm not a Statistician, but it seems to me like the obvious measure you want to look at is the expected value of the Bernoulli parameter (p in the link) given the known information (namely numbers of upvotes and downvotes); not this Wilson what-a-ma-jig, whatever it is. A simple Bayesian analysis gives that it is given by:
E(p)= integral from 0 to 1 of [P(p) p^(n_u+1) (1-p)^(n_d) dp] / integral from 0 to 1 of [P(p) p^(n_u) (1-p)^(n_d) dp]
Where P(p) is a prior probability distribution on the Bernoulli parameters, which you can approximate presumably from knowing the scores of other items on your site that have lots of votes; n_u is number of upvotes; n_d is number of downvotes. Instead of doing integrals you can change your models to a finite number of possible Bernoulli parameters and get sums instead.
For example, given a uniform prior distribution between 0 and 1, an item with 2 upvotes and 1 downvote will have E(p)=3/5. 1 upvote, no downvotes would give E(p)=2/3. Actually, for a uniform prior, this gives exactly the same result as starting off each item with one upvote and one downvote, which is kind of what moogle suggested.
Was John Leguizamo in this dream? Did he betray you?
He will betray you, Naseem.
Wellity, well well... Look who has been unbanned from this blog!
I am all the time stumbling upon hand-written websites on loose-leaf paper.
Guess what I did! I finally posted day one of my hourlies that I did a month ago on my LJ account! They are awful! HOORAY!!
I approve and don't think they're awful!