My research interests include

- Machine Learning; Pattern Recognition
- Swarm Intelligence; Ant Colony Optimization
- Information Theory; Error-Correcting Codes
- Probability; Bayesian Networks; Monte Carlo methods
- Computational Complexity; Cryptography ...

Listed below are several projects I did or am working on:

We designed a family of random coordinate descent (RCD) algorithms to directly minimize the training error for perceptron learning. Compared with other algorithms, our RCD algorithms are efficient and usually achieve the lowest training error. Attributed to this excellent in-sample performance, RCD algorithms work well with boosting algorithms such as AdaBoost. We tested on 12 real-world and artificial datasets and obtained satisfactory results. (report)

Learning with complex real-world data is further complicated by the fact that outliers and noisy examples regularly occur in the data given. The situation is like teaching a child quantum mechanics with an erroneous textbook. The theory and techniques we are developing, dubbed as data pruning, focus on making learning with complex noisy data easier and more accurate. Ideally, in the analogue of teaching quantum mechanics, our techniques will automatically fix errors in the textbook and replace some parts with those in Newton mechanics, if the latter are easier to grasp. Our initial results show that data pruning is very promising and improves the generalization performance of learning. (poster, paper)

The superior out-of-sample performance of AdaBoost has been attributed to the fact that it minimizes a cost function based on margin. In order to examine how the cost function affects the out-of-sample performance, we apply several more sophisticated optimization techniques directly to the cost function. When the AdaBoost exponential cost function is optimized, our methods generally yield much lower cost and training error but higher test error, which implies that the exponential cost is vulnerable to overfitting. With the optimization power gained, we can adopt more "regularized" cost functions that have better out-of-sample performance but are difficult to optimize. Our experiments demonstrate that with suitable cost functions, our methods can have better out-of-sample performance. (paper, webpage at CNSE)

We have looked at a concrete case study (the stick-pulling experiment) where multiple robots in an arena work on a task that cannot be done without collaboration. The results after learning show that the robots become specialized. This is quite interesting since we never explicitly reward diversity in our learning algorithm and there is no explicit communication among the robots. (poster, paper, webpage at CNSE)

We went on measuring the degree of specialization emergent from learning. Specialization is defined as the part of diversity that is incented by the need of performance improvement. We observed some interesting and sometimes counterintuitive results with the generalized stick-pulling experiments. (poster, paper, paper)

For the BCJR algorithm, a conjecture is that the p.d.f. of the log-likelihood
ratio (*LLR*) is gaussian with variance twice as the expectation.
My experiments show that this
conjecture is not true.