CS/CNS/EE 156b Learning Systems (Fall 2001)

Prof. Abu-Mostafa

My project is the Letter Recognition, which has 15997 training samples and 4003 test samples. Here is my gzipped code: cs156bpj1_src.tgz (23K).

The best hypothesis I got was trained by

./testr letter_train 12000 3997 trainerr 5 nn 70 5 100 0.0001 1000

which means using AdaBoost.M2 with stochastic gradient descent (stop at epoch 1000) to train on the first 12000 samples and get an aggregated hypothesis consisting of 100 (16-70-50-26) neural networks. It has a 2.75% classification error on the rest 3997 samples.

You may download the huge hypothesis file (gzipped, 2.25M), which really is nn in the command line.

I estimated that the out-of-sample error is within [2.24%, 3.26%] with 95% confidence. The real test error on the 4003 data samples is about 2.348%. You can verify it by

./testr letter_test 4000 3 testerr 5 nn 70 -5 100 0.0001 1000

Note that my program doesn't allow the number of training or validation samples to be 0. We have to separate the test set into two parts and then combine the errors on them to get the test error. This is cumbersome but anyway it would only happen once.

Below you can find all my presentations and final report on this project.

I believe more training samples will lead to a better out-of-sample error. Even for this problem, the 12000 training samples are already abundant, the belief is true. I tried 15000 samples for training and got a test error of 1.949%, which is quite close to the best ever-got result. Here is the hypothesis file (gzipped, 2.24M) and the plot of validation and test errors (eps).