Abstract.
This project focuses on both practical and theoretical aspects
of the monotonicity constraint in machine learning. Learning methods
which enforce monotonicity in models such as neural networks are
being developed. In addition, the flexibility and expressive power
of the class of monotonic binary output functions are analyzed
and quantified from a theoretical perspective.
Motivation
& Aims. Recent theoretical results in machine learning
have emphasized the importance of prior information. The situation
where we have certain beliefs or biases (also known as hints)
about the function to be learned has in the past been thought
of as merely one special case of the general, "assumption-free"
paradigm. However, many machine learning theorists are coming
to understand that the incorporation (implicit or explicit) of
priors is fundamental to any learning situation where we have
a hope of being successful. This new outlook motivates the development
of learning systems which capitalize efficiently on pieces of
prior information.
Monotonicity
is a property which can be ascribed to the target function in
many practical applications. For instance, consider the task of
learning to screen credit card applicants. Common sense tells
us that the higher a person's salary, the more creditworthy the
person is, all else being equal. What this means mathematically
is that creditworthiness is monotonically increasing in salary.
Monotonicity holds in numerous other situations, including many
problems in economic forecasting and medical diagnosis. In this
project, we seek to develop ways of incorporating monotonicity
constraints into the learning process and to understand the benefits
of doing so.
Research.
The
practical side of the research project is concerned with methods
for enforcing monotonicity constraints in machine learning models.
One technique implements monotonicity approximately by adding
a second term to the objective function to be optimized. The additional
term measures the degree to which the model currently obeys monotonicity.
Because this method is based on an alteration of the objective
function, it is compatible with the use of virtually any parametrized
learning model. This approach falls within the general learning-from-hints
paradigm pioneered by Abu-Mostafa. Efforts also focus on the design
of a learning model which adheres to the monotonicity constraint
exactly, i.e., by virtue of its functional form. This model, called
the monotonic network, consists of groups of hyperplanes upon
which maximum and minimum operations are applied.
Theoretical
investigations center around the analysis of the class of monotonic
binary output functions. Interestingly, the flexibility of this
class turns out to depend heavily on the input distribution. A
technique has been developed for estimating the capacity of monotonic
functions given a particular input distribution. The bounds derived
by Vapnik and Chervonenkis allow us to bound the out-of-sample
error of a model given its VC entropy. Direct estimation of the
VC entropy of the class of monotonic functions is computationally
infeasible, but a method for efficiently bounding the VC entropy
has been developed using results from graph theory.
Achievements.
Both
techniques of enforcing monotonicity have been applied to real-world
problems and have been shown to lead to statistically significant
improvements in performance over current benchmark machine learning
methods.
The
VC entropy of the class of monotonic functions has been bounded
for various realistic input distributions, demonstrating that
the enforcement of monotonicity can greatly reduce the VC entropy
of learning models such as neural networks and therefore decrease
out-of-sample error.
Publications/References
Monotonicity
Hints. J. Sill and Y. Abu-Mostafa. NIPS 9:634-640.
The
Capacity of Monotonic Functions. J. Sill. to appear
in Discrete Applied Mathematics, special issue on VC dimension.