Lessons on recommendation systems
Recently I had the opportunity to guest-lecture a class at Stanford called Data Mining and Electronic Business, which is taught by my friend (and former Chief Scientist at Amazon) Andreas Weigend. This is a graduate-level class in the Statistics and Management Science departments — if you’re at Stanford and are interested in how data-mining is valuable to businesses, I’d highly recommend this class.
The subject of the lecture was “Recommendation Systems”. Andreas was partly responsible for turning Amazon into a very data driven culture, so I was very excited to work with him on this. It was a great experience — I learned a lot and hopefully the class did too. I thought I’d share some of the main points:
- Amazon makes 20-30% of its sales from recommendations. Only 16% of people go to Amazon with explicit intent to buy something
- The data that you collect matters much more than the algorithm you use. Amazon’s algorithm is essentially a large product-product correlation matrix for the past hour, but it works for them because hey collect so much data through user actions
- Many problems including shopping, targeted advertising, dating, finding events, etc. can be framed as recommendation problems
- Very important take away: find ways to collect as much user input as possible without being disruptive. People don’t train systems, they try to benefit themselves, but this is the best kind of training data
- There are a lot of different types of data that can train a system: votes, clicks, page-view time, purchases, tagging, adding a title — the user does these things anyway, and you can use the data
- A/B testing is an effective and underused way to learn about people. Simply by varying the way you phrase something, you can learn more about your users
- Very few systems now are combining metadata or content with collaborative filtering. The consensus in the class when discussing a music recommendation system was that this could be very effective
Much of the lecture was on frameworks for thinking about recommendations, algorithms and means of testing the quality, which don’t boil down to bullet points very easily.
Much of the time when I pick up a paper on recommendation systems, the sense is “given this set of data lets design an algorithm to make better predictions”. So I think that the overall message bears repeating — when thinking about recommendations or targeted content/advertising, the most important to think about is all the different things that people do on your site anyway: tagging, buying, titling, clicking, determine how much of that you can capture, and try some basic, quick-running algorithms.
May 29th, 2008 at 6:05 pm
I’ll warn you now, this is a shameless plug, but its on-topic.
SenseArray (http://sensearray.com/) is a commercial collaborative filter that, among other things, allows you to provide metadata both for users and items, and employs this metadata to “bootstrap” its recommendations.
May 29th, 2008 at 7:27 pm
Any light as to what tools amazon uses to analyze 1 hour of data?
“Amazon’s algorithm is essentially a large product-product correlation matrix for the past hour”
May 29th, 2008 at 11:58 pm
Very enlightening post. Recommendation systems could and should be used by more traditional companies such as banks, etc… but they are currently not comfortable with the type of transparency this might bring forth which is probably a rather shortsighted and fear-based decision. Any ideas on this topic?
May 30th, 2008 at 7:59 am
I don’t suppose anyone recorded the lecture? I’d love to hear it. (Or maybe a slide deck or similar?) I appreciate that there may well be reasons why that’s not possible but it would be great if it existed.
June 5th, 2008 at 10:32 am
[…] Lessons on recommendation systems find ways to collect as much user input as possible without being disruptive. People don’t train systems, they try to benefit themselves, but this is the best kind of training data (tags: recommendation) […]
June 9th, 2008 at 2:02 pm
Toby, the http://weigend.com/teaching/stanford/ URL is broken.
I’ll second the query about a recording of the lecture.
June 13th, 2008 at 5:16 am
[…] Did you know that Amazon makes 20-30% of it’s sales through recommendations? This and other valuable lessons on building recommendation systems can be found here. […]
July 13th, 2008 at 10:01 pm
It’s a very good lecture. Anyway, I want to know whether you can recommend any good python clustering api for a recommender system?
Thanks!
November 13th, 2008 at 12:05 pm
“Very few systems now are combining metadata or content with collaborative filtering. The consensus in the class when discussing a music recommendation system was that this could be very effective”
Interesting! I’d say that most of the music rec. systems I know of are using any kind of (basic) combination of metadata (e.g. genre) and CF. Also, there are some work that combines audio content-based similarity, plus social behavior (either CF or social tagging).
Actually, there are some examples in the Music Recommendation Tutorial (ISMIR 2007) about how one can merge all these data sources:
http://www.slideshare.net/ocelma/music-recommendation-tutorial
Cheers, Oscar
November 6th, 2009 at 5:30 pm
cool pics
November 9th, 2009 at 11:54 pm
How do you implement “recommendation” for B2B side where you dont have that much data or input. Thanks & Best.
February 6th, 2010 at 10:45 pm
I am thinking the same. Occasionally I just do not comprehend how folk can believe you’re incorrect.