Abstract.
The problem of overfitting the data is attacked by using the Bin
Model analysis. This provides a method of bounding generalization
error without sacrificing valuable training data.
Motivation
& Aims. In the standard learning scenario, we are given
a finite set of examples with targets that were generated according
to some rule, and the task is to extract the rule from the examples.
It is often the case that we can fit the data well, but have failed
to extract the rule so that outputs given on new data are incorrect.
This is the problem of overfitting the data.
We
would like to give a measure of how well we expect our hypothesis
to perform on new data, that is, what sort of generalization we
can achieve. This is most commonly done by reserving a portion
of the data set for directly estimating the generalization error.
When only a small number of examples are available, this can be
undesirable. The Bin Model analysis attempts to provide a method
of bounding generalization error without sacrificing valuable
training data.
Research
& Achievements. The
Bin Model - We consider binary classification problems - the
target function f(x) maps input vectors x to {0,1}. We represent
each candidate function g(x) by a bin. Each bin is characterized
by a parameter
,
which is the probability that g(x) disagrees with f(x) on a point
randomly chosen from the input space.
A
bin can be visualized as containing red and green marbles. The
probability of picking a red marble is
.
When we draw i.i.d. examples from the input space and test them
on the functions f and g, we are virtually drawing marbles from
a bin and checking their colors. We can consider each sample as
a Bernoulli trial with probability
of failure (red marble) and probability (1-
)
of success (green marble).
When
we look at a class of hypothesis functions - our learning model,
we have a set of
's
which we can consider as being drawn from a distribution, p(
).
The
-distribution indicates
the suitability of the learning model for approximating the target
function.
It
is worth noticing that the
-distribution
embodies all of the information needed to analyze the learning
problem. We do not need the information about the specific forms
that the candidate functions assume. Nor do we need to know anything
about the inputs and input distribution. All that we do is to
draw marbles (equivalent to i.i.d. inputs) from the bins characterized
by the
-distribution.
This makes it possible to model learning machines in a very simple
way and without loss of generality.
Because
of its simplicity, the bin model is useful for giving us ideas
about generalization. The
-distribution
summarizes the generalization behavior entirely, so, although
it is dependent on the learning model and target function, we
can expect functions with similar
-distributions
to have similar generalization performance, regardless of other
model complexity measures. This result, although contrary to common
conceptions about complexity and overfitting, has been experimentally
verified, and is illustrated by the following figure.
Currently
under investigation are ways of estimating the
distribution
in practical learning situations, applications of the Bin Model
analysis to continuous and noisy target functions, and the effects
of certain learning algorithms on the generalization results.
Publications/References
Bin
Model for Neural Networks. Abu-Mostafa,
Y. and Song, X. In: Proceedings of the International Conference
on Neural Information Processing (ICONIP'96), Hong Kong, 1996,
pp. 169-173.