This is the last snapshot before I graduated from Caltech.
AdaBoost_ERP
mentioned
in my paper Multiclass Boosting
with Repartitioning.
CrossVal
,
vFoldCrossVal
, and HoldoutCrossVal
.
They can also be used as learning models. See test/testsvm.cpp
for a demonstration.
Ordinal_BLE
which was developed in the early stage of
the paper Ordinal Regression by
Extended Binary Classification.
It is outdated and will probably be rewritten in
the future to keep up with the paper.
SVM
changes: Support vectors are copied out from LIBSVM
(which can save some memory usage);
Kernel
and thus SVM
can be saved/loaded.
DataFeeder
can add flipping noise (set_train_noise()
).
reset()
);
Possible to convert between some boosting models;
(multiclass) Allow data sets to be loaded after an ECOC table is set.
MultiClass_ECOC
and
AdaBoost_ECOC
. See test/multi.cpp
for an example with one-vs-all.
margin()
/signed_margin()
functions scattered in different classes to four functions in
LearnModel
: margin()
, margin_of()
,
min_margin()
, and margin_norm()
.
The first three give unnormalized margins and the last is the
normalization term.
DataFeeder
. It is handy
when data split/normalization is needed.
SVM
need a small modification.
MultiClass_ECOC
.
SVM::w_norm()
,
invalid cache in Boosting::initialize()
,
better Boosting::get_output()
when no cache is used,
and a typo in RBF::matrix
.
Perceptron
added. Implemented several
perceptron learning algorithms mentioned in my paper
Perceptron Learning with Random
Coordinate Descent.
LPBoost
added.
Hsuan-Tien Lin
contributed the code which uses
GLPK.
Kernel
class was added since kernels
can be used for algorithms other than SVM.
SVM
: Can be cloned. More insideinformation can be obtained, such as the 2-norm of the weight vector, the support vectors, and the coefficients.
Boosting
, including
AdaBoost
and CGBoost
.
load_data()
can auto-detect the input dimension.
randn()
.
c_error()
and r_error()
are in
LearnModel
.
Pulse
bug (introduced in 0.1 beta) fixed.
Pulse
may fail to choose the optimal hypothesis under
some conditions.
MgnBoost
(Breiman's arc-gv) added. Test code
is added to test/adabst.cpp.
boosting::margin()
gives the margin of an
individual training example, or the minimal margin of the training set.
Stump
. Incomplete.
I haven't tested the new code with Visual C++.NET.
Aggregation
renamed to Aggregating
.
CGBoost
added. CGBoost is better
than AdaBoost in optimizing cost functions. For details please refer
to the CGBoost
technical report (note that small modifications are required
in _conjugate_gradient
(optimize.h) in order to
set β=0 for the first several iterations).
SVM
and test code testsvm added.
LIBSVM,
modified to support weighted training examples, is used for actual
work. Currently only SVM classification with RBF kernel is supported.
Serialization/unserialization has not been implemented yet.
lemga::cost
(cost.h) added.
I try to separate the cost functions from the learning/optimization
methods, and this is a temporary solution before functors are used.
Pulse
and Stump
.
Training a pulse function now takes O(n) time.
Pulse
parameters.
dataset::replace
and member Boosting::min_err
.
save()
and load()
replaced by a much
better serialization/unserialization implementation. Operator >>
is used for saving models, and <<
for loading. create(istream)
can create an unknown-type object from an input stream. (Thus the base
model in class Aggregation
is no longer needed when loading
models.)
load_data()
accepts an input stream instead of a FILE*
handle.
Pulse
added. It is a multi-transition
phase (step) function. The best hypothesis with number of transitions
equal to or less than a given limit is returned. When the limit is 1,
pulse is almost the same as stump (the only difference is that pulse
may return a hypothesis with no transitions at all). The code has been
tuned so that it is even faster than stump when the limit is 1.
_line_search
will early stop if
a non-descending direction is met. This change affects conjugate
gradient, boosting in the functional space, and the training of neural
networks. For example, convex boosting now returns a very large number
as cost when empty to avoid non-descendingat the first step.
REGISTER_CREATOR
simplifies the object creator
registration.
_gd_weightdecay
) added.
id()
returns a const string
instead of
char*
.
AdaBoost
with Pulse
)
and a model file checker (showlm) added.
copy()
renamed to clone()
.
Bagging bag(n_in, n_out)
simply becomes Bagging bag
.
Boosting
via BoostWgt
and
_boost_gd
) reimplemented so that conjugate gradient
and some variants of gradient descent are possible (in the functional
space).
_register_creator
);
constructors accepting istream
as argument added (interface
only); version()
was renamed to id()
and no
longer contains version information.
vectorop
became lemga::op
.
It serves only for generic optimization in Lemga; only a small set of
operations is needed for optimization
_shared_ptr
.
create()
added as one virtual constructor.
AdaBoost
removed.
Cascade
class added.
The code rewriting is almost done and I've tested Lemga in one project (alphaBoost) with GCC 2.96, 3.0.1, and 3.2.1. Models and algorithms currently coded are: