Recent Changes - Search:

PmWiki

pmwiki.org

edit SideBar

Start

CS/CNS/EE 156b: Learning Systems (Winter 2007-08)

Class Time: Tuesday and Thursday, 2:30--3:55 pm in MRE 070.


Announcements

  • Thank you all for participation! You are now free to work on the Netflix problem on your own as you see fit. You can freely adopt any ideas learned in class in your work.
  • Please do not use the base/valid/hidden data sets any more. You can generate new sets using the same idea if you want to.

Main Menu


People

InstructorYaser Abu-Mostafayaser(at)caltech.edu
SecretaryLucinda Acostalucinda(at)sunoptics.caltech.edu
TAHsuan-Tien Lincs156ta(at)work.caltech.edu
TAAmrit Pratapcs156ta(at)work.caltech.edu
Studentsteams 

We have enjoyed a great tea party on 1/17/2008. We are currently in the stage of competing on the score board.


Introduction

This is a project course. The theme of the course this year is based on the Netflix Prize: predicting the rank that a user would give to some movie. The course will go through three stages: infrastructure, training, and aggregation. In the infrastructure stage, students will work on some tasks that would facilitate the latter stages . Then, in the training stage, each team (of at most two people) will implement the ideas they have, and present their findings. In the aggregation stage, the teams are going to study how/whether they can aggregate the methods implemented by the other teams in the class to achieve better performance. Grades will be given according to the works done.

Note that there are over a hundred million examples in the Netflix data set. Thus, the students should be aware that the tasks would be computationally intensive as well as time consuming, and would require a good understanding and expertise in programming.


Rules

The participants of this course need to follow both the Netflix rules and the class rules. In case of conflict, the former supersede the latter in prize eligibility matters, and the latter supersede the former in academic matters.


Notes and Presentations


Data Sets


Discussion Forum


Tasks

* data access -- writing utility programs to access the huge data set quickly
* data study -- mining unusual behaviors/statistics in the data set
* method review -- reading and summarizing existing algorithms

Resources

  • Netflix Prize, rules, forum
  • The solution from the 2007 Progress Prize winner. Note that their solution is based on aggregating 107 methods.
  • pyflix: a python library that facilitates fast access of the data set.
  • Netflix Recommender Framework: A small C++ framework that lets you hopefully think about the algorithm and not how to fit the database in your memory.
  • SVD Demo: a demo implementation of the SVD algorithm in Visual C++. The author also posted some statistics of the data that he found interesting.
  • LingPipe: a java library with an SVD implementation. You may want to see the author's blog post, which also contains some comments about the prize.
  • a nice introduction to SVD for Netflix: http://sifter.org/~simon/journal/20061211.html
  • Geoffrey E. Hinton's webpage for references and details on RBM
  • WEKA is a machine learning workbench developed at the University of Waikato that implements a large number of machine learning techniques. As it is implemented in Java, it will operate on a variety of platforms. The main WEKA page is at http://www.cs.waikato.ac.nz/ml/. See the forum for more information.