This is the last snapshot before I graduated from Caltech.
AdaBoost_ERPmentioned in my paper Multiclass Boosting with Repartitioning.
HoldoutCrossVal. They can also be used as learning models. See test/testsvm.cpp for a demonstration.
Ordinal_BLEwhich was developed in the early stage of the paper Ordinal Regression by Extended Binary Classification. It is outdated and will probably be rewritten in the future to keep up with the paper.
SVMchanges: Support vectors are copied out from LIBSVM (which can save some memory usage);
SVMcan be saved/loaded.
DataFeedercan add flipping noise (
reset()); Possible to convert between some boosting models; (multiclass) Allow data sets to be loaded after an ECOC table is set.
AdaBoost_ECOC. See test/multi.cpp for an example with one-vs-all.
signed_margin()functions scattered in different classes to four functions in
margin_norm(). The first three give unnormalized margins and the last is the normalization term.
DataFeeder. It is handy when data split/normalization is needed.
SVMneed a small modification.
SVM::w_norm(), invalid cache in
Boosting::get_output()when no cache is used, and a typo in
Perceptronadded. Implemented several perceptron learning algorithms mentioned in my paper Perceptron Learning with Random Coordinate Descent.
LPBoostadded. Hsuan-Tien Lin contributed the code which uses GLPK.
Kernelclass was added since kernels can be used for algorithms other than SVM.
SVM: Can be cloned. More
insideinformation can be obtained, such as the 2-norm of the weight vector, the support vectors, and the coefficients.
load_data()can auto-detect the input dimension.
Pulsebug (introduced in 0.1 beta) fixed.
Pulsemay fail to choose the optimal hypothesis under some conditions.
MgnBoost(Breiman's arc-gv) added. Test code is added to test/adabst.cpp.
boosting::margin()gives the margin of an individual training example, or the minimal margin of the training set.
I haven't tested the new code with Visual C++.NET.
CGBoostadded. CGBoost is better than AdaBoost in optimizing cost functions. For details please refer to the CGBoost technical report (note that small modifications are required in
_conjugate_gradient(optimize.h) in order to set β=0 for the first several iterations).
SVMand test code testsvm added. LIBSVM, modified to support weighted training examples, is used for actual work. Currently only SVM classification with RBF kernel is supported. Serialization/unserialization has not been implemented yet.
lemga::cost(cost.h) added. I try to separate the cost functions from the learning/optimization methods, and this is a temporary solution before functors are used.
Stump. Training a pulse function now takes O(n) time.
load()replaced by a much better serialization/unserialization implementation. Operator
>>is used for saving models, and
create(istream)can create an unknown-type object from an input stream. (Thus the base model in class
Aggregationis no longer needed when loading models.)
load_data()accepts an input stream instead of a
Pulseadded. It is a multi-transition phase (step) function. The best hypothesis with number of transitions equal to or less than a given limit is returned. When the limit is 1, pulse is almost the same as stump (the only difference is that pulse may return a hypothesis with no transitions at all). The code has been tuned so that it is even faster than stump when the limit is 1.
_line_searchwill early stop if a non-descending direction is met. This change affects conjugate gradient, boosting in the functional space, and the training of neural networks. For example, convex boosting now returns a very large number as cost when empty to avoid
non-descendingat the first step.
REGISTER_CREATORsimplifies the object creator registration.
id()returns a const
Pulse) and a model file checker (showlm) added.
Bagging bag(n_in, n_out)simply becomes
_boost_gd) reimplemented so that conjugate gradient and some variants of gradient descent are possible (in the functional space).
_register_creator); constructors accepting
istreamas argument added (interface only);
version()was renamed to
id()and no longer contains version information.
lemga::op. It serves only for generic optimization in Lemga; only a small set of operations is needed for optimization
create()added as one virtual constructor.
The code rewriting is almost done and I've tested Lemga in one project (alphaBoost) with GCC 2.96, 3.0.1, and 3.2.1. Models and algorithms currently coded are: