Lessons on recommendation systems
Recently I had the opportunity to guest-lecture a class at Stanford called Data Mining and Electronic Business, which is taught by my friend (and former Chief Scientist at Amazon) Andreas Weigend. This is a graduate-level class in the Statistics and Management Science departments — if you’re at Stanford and are interested in how data-mining is valuable to businesses, I’d highly recommend this class.
The subject of the lecture was “Recommendation Systems”. Andreas was partly responsible for turning Amazon into a very data driven culture, so I was very excited to work with him on this. It was a great experience — I learned a lot and hopefully the class did too. I thought I’d share some of the main points:
- Amazon makes 20-30% of its sales from recommendations. Only 16% of people go to Amazon with explicit intent to buy something
- The data that you collect matters much more than the algorithm you use. Amazon’s algorithm is essentially a large product-product correlation matrix for the past hour, but it works for them because hey collect so much data through user actions
- Many problems including shopping, targeted advertising, dating, finding events, etc. can be framed as recommendation problems
- Very important take away: find ways to collect as much user input as possible without being disruptive. People don’t train systems, they try to benefit themselves, but this is the best kind of training data
- There are a lot of different types of data that can train a system: votes, clicks, page-view time, purchases, tagging, adding a title — the user does these things anyway, and you can use the data
- A/B testing is an effective and underused way to learn about people. Simply by varying the way you phrase something, you can learn more about your users
- Very few systems now are combining metadata or content with collaborative filtering. The consensus in the class when discussing a music recommendation system was that this could be very effective
Much of the lecture was on frameworks for thinking about recommendations, algorithms and means of testing the quality, which don’t boil down to bullet points very easily.
Much of the time when I pick up a paper on recommendation systems, the sense is “given this set of data lets design an algorithm to make better predictions”. So I think that the overall message bears repeating — when thinking about recommendations or targeted content/advertising, the most important to think about is all the different things that people do on your site anyway: tagging, buying, titling, clicking, determine how much of that you can capture, and try some basic, quick-running algorithms.