Ordinal regression (ranking)
resides uniquely between
multiclass classification and metric regression in the area
of supervised learning. The ordinal
regression problem can be thought
as a multiclass problem with an ordering preference, or as a
regression problem without knowing much about the underlying metric.
It has many applications in social science and information
retrieval to match human preferences, and has caught
much attention in the machine learning community in recent years.
(see some introduction slides in my talk:
Caltech EE Pizza Meeting)
- Thresholded ensemble model:
We propose a thresholded ensemble model for ordinal regression
problems. The model consists of a weighted ensemble of confidence
functions and an ordered vector of thresholds. We derive novel
large-margin bounds of common error functions, such as the
classification error and the absolute error. We propose new and simple
for constructing thresholded ensembles. These
approaches have comparable performance to state-of-the-art algorithms,
but enjoy the benefit of faster training.
(see our paper: ALT '06
We propose a new reduction framework that systematically
transforms ordinal regression problems to the more well-studied
problems. Then, any binary classification algorithm can be
for ordinal regression, and improvements for binary classification
can be immediately inherited for ordinal regression.
In addition, many existing
ordinal regression algorithms and models,
including the thresholded ensemble model, perceptron ranking,
and support vector ordinal regression,
can be viewed as special cases of the reduction framework.
In the theoretical perspective,
the framework enlightens simple and intuitive proof for ordinal
regression theorems, and in the practical aspect, the framework
performs very well on benchmark data sets.
(see our paper: NIPS '06
- From Group Research Page:
There are two important phenomena about real world learning problems. First,
they may contain noisy or mislabeled data, which can mislead the learning
algorithm. Second, they may have data that are too complex, and make it hard
for the algorithm to extract the essence. In both cases, the
performance of the algorithm degrades due to the incorrect or complex data
given. In order to obtain a better generalization ability, we want to prune
those unfavorable data before launching the learning procedure. This would
also help set up analysis tools in areas that we have abundant data, such as
We have found that several learning algorithms, such as the rho-Learning
scenario, AdaBoost, and Support Vector Machines, can offer some help in
identifying unfavorable data (see our paper: PKDD '05).
We are trying to justify the data selection framework in the learning aspect,
and to build useful selection tools by understanding the behavior of different
learning algorithms in various environments.
- Some Studies:
I keep working with Neural Network for regression (function fitting)
and its behavior in noisy environments. Traditional statistical
studies have provided useful tools for analyzing regression
results. I hope that by understanding the behavior
of Neural Networks through these tools, we can also understand
the affect of unfavorable data, and thus find ways to deal with them.
Support Vector Machine is an algorithm
for pattern classification/regression. The two
key ideas inside, "large margin" and "kernel mapping",
make SVM a powerful machine learning tool.
I was in the
Prof. Lin in National
Taiwan University. We studied various aspects
about SVMs both in theory
and in practice.
I am still interested in things related to this area.
- Infinite Ensemble Learning:
Traditional ensemble learning algorithms, such as boosting,
aggregates a finite number of hypotheses.
However, it is not clear whether an ensemble with infinite
number of hypotheses shall be used, and it is a challenging
problem to construct such ensemble.
Our work applies SVM optimization machinery and the kernel trick to ensemble learning.
The kernel trick makes it possible to embed an infinite number of
hypotheses into a simple kernel computation. The work results in a novel learning framework
that constructs an infinite ensemble. The
framework provides further understanding
for designing new kernels, explaining existing kernels, and comparing
boosting with SVM.
When our SVM-based framework and AdaBoost (a popular boosting algorithm)
are designed with the same set of hypotheses, our framework outperforms
AdaBoost experimentally. Further analysis shows that the sparsity/finiteness
property of AdaBoost is the key for explaining the difference.
On the other hand, the novel framework aggregates an infinite number of hypotheses,
and does not suffer from the limitation of sparsity/finiteness. (see our papers:
my Master's Thesis,
Feel free to contact me: "htlin" at "csie.ntu.edu.tw"