kiwitobes.com

kiwitobes.com

Author, Software Developer, and Data Magnate

kiwitobes.com RSS Feed
 

Lessons on recommendation systems

Recently I had the opportunity to guest-lecture a class at Stanford called Data Mining and Electronic Business, which is taught by my friend (and former Chief Scientist at Amazon) Andreas Weigend. This is a graduate-level class in the Statistics and Management Science departments — if you’re at Stanford and are interested in how data-mining is valuable to businesses, I’d highly recommend this class.

The subject of the lecture was “Recommendation Systems”. Andreas was partly responsible for turning Amazon into a very data driven culture, so I was very excited to work with him on this. It was a great experience — I learned a lot and hopefully the class did too. I thought I’d share some of the main points:

  • Amazon makes 20-30% of its sales from recommendations. Only 16% of people go to Amazon with explicit intent to buy something
  • The data that you collect matters much more than the algorithm you use. Amazon’s algorithm is essentially a large product-product correlation matrix for the past hour, but it works for them because hey collect so much data through user actions
  • Many problems including shopping, targeted advertising, dating, finding events, etc. can be framed as recommendation problems
  • Very important take away: find ways to collect as much user input as possible without being disruptive. People don’t train systems, they try to benefit themselves, but this is the best kind of training data
  • There are a lot of different types of data that can train a system: votes, clicks, page-view time, purchases, tagging, adding a title — the user does these things anyway, and you can use the data
  • A/B testing is an effective and underused way to learn about people. Simply by varying the way you phrase something, you can learn more about your users
  • Very few systems now are combining metadata or content with collaborative filtering. The consensus in the class when discussing a music recommendation system was that this could be very effective

Much of the lecture was on frameworks for thinking about recommendations, algorithms and means of testing the quality, which don’t boil down to bullet points very easily.

Much of the time when I pick up a paper on recommendation systems, the sense is “given this set of data lets design an algorithm to make better predictions”. So I think that the overall message bears repeating — when thinking about recommendations or targeted content/advertising, the most important to think about is all the different things that people do on your site anyway: tagging, buying, titling, clicking, determine how much of that you can capture, and try some basic, quick-running algorithms.

12 Responses to “Lessons on recommendation systems”

  1. Gravatar
    1
    Ian Clarke:

    I’ll warn you now, this is a shameless plug, but its on-topic.

    SenseArray (http://sensearray.com/) is a commercial collaborative filter that, among other things, allows you to provide metadata both for users and items, and employs this metadata to “bootstrap” its recommendations.

  2. Gravatar
    2
    Alexander:

    Any light as to what tools amazon uses to analyze 1 hour of data?

    “Amazon’s algorithm is essentially a large product-product correlation matrix for the past hour”

  3. Gravatar
    3
    Peter T Webshop:

    Very enlightening post. Recommendation systems could and should be used by more traditional companies such as banks, etc… but they are currently not comfortable with the type of transparency this might bring forth which is probably a rather shortsighted and fear-based decision. Any ideas on this topic?

  4. Gravatar
    4
    Andrew Cherry:

    I don’t suppose anyone recorded the lecture? I’d love to hear it. (Or maybe a slide deck or similar?) I appreciate that there may well be reasons why that’s not possible but it would be great if it existed.

  5. 5
    links for 2008-06-05 « Brent Sordyl’s Blog:

    […] Lessons on recommendation systems find ways to collect as much user input as possible without being disruptive. People don’t train systems, they try to benefit themselves, but this is the best kind of training data (tags: recommendation) […]

  6. Gravatar
    6
    Michael R. Bernstein:

    Toby, the http://weigend.com/teaching/stanford/ URL is broken.

    I’ll second the query about a recording of the lecture.

  7. 7
    Fat Man - interactive design & development collective | Lessons on recommendation systems:

    […] Did you know that Amazon makes 20-30% of it’s sales through recommendations? This and other valuable lessons on building recommendation systems can be found here. […]

  8. Gravatar
    8
    sengly:

    It’s a very good lecture. Anyway, I want to know whether you can recommend any good python clustering api for a recommender system?

    Thanks!

  9. Gravatar
    9
    Oscar:

    “Very few systems now are combining metadata or content with collaborative filtering. The consensus in the class when discussing a music recommendation system was that this could be very effective”

    Interesting! I’d say that most of the music rec. systems I know of are using any kind of (basic) combination of metadata (e.g. genre) and CF. Also, there are some work that combines audio content-based similarity, plus social behavior (either CF or social tagging).

    Actually, there are some examples in the Music Recommendation Tutorial (ISMIR 2007) about how one can merge all these data sources:
    http://www.slideshare.net/ocelma/music-recommendation-tutorial

    Cheers, Oscar

  10. Gravatar
    10
    Moynahan:

    cool pics

  11. Gravatar
    11
    Dayama:

    How do you implement “recommendation” for B2B side where you dont have that much data or input. Thanks & Best.

  12. Gravatar
    12
    springfield xd:

    I am thinking the same. Occasionally I just do not comprehend how folk can believe you’re incorrect.

Leave a Reply