LearnModel Class Reference

A unified interface for learning models. More...

#include <learnmodel.h>

Inheritance diagram for LearnModel:

Inheritance graph
[legend]
Collaboration diagram for LearnModel:

Collaboration graph
[legend]
List of all members.

Public Member Functions

 LearnModel (UINT n_in=0, UINT n_out=0)
virtual Output operator() (const Input &) const =0
virtual Output get_output (UINT idx) const
 Get the output of the hypothesis on the idx-th input.
bool valid_dimensions (UINT, UINT) const
bool valid_dimensions (const LearnModel &l) const
bool exact_dimensions (UINT i, UINT o) const
bool exact_dimensions (const LearnModel &l) const
bool exact_dimensions (const DataSet &d) const
virtual LearnModelcreate () const =0
 Create a new object using the default constructor.
virtual LearnModelclone () const =0
 Create a new object by replicating itself.
UINT n_input () const
UINT n_output () const
void set_log_file (FILE *f)
virtual bool support_weighted_data () const
 Whether the learning model/algorithm supports unequally weighted data.
virtual REAL r_error (const Output &out, const Output &y) const
 Error measure for regression problems.
virtual REAL c_error (const Output &out, const Output &y) const
 Error measure for classification problems.
REAL train_r_error () const
 Training error (regression).
REAL train_c_error () const
 Training error (classification).
REAL test_r_error (const pDataSet &) const
 Test error (regression).
REAL test_c_error (const pDataSet &) const
 Test error (classification).
virtual void initialize ()
virtual void set_train_data (const pDataSet &, const pDataWgt &=0)
 Set the data set and sample weight to be used in training.
const pDataSettrain_data () const
 Return pointer to the embedded training data set.
virtual void train ()=0
 Train with preset data set and sample weight.
virtual void reset ()
virtual REAL margin_norm () const
 The normalization term for margins.
virtual REAL margin_of (const Input &x, const Output &y) const
 Report the (unnormalized) margin of an example (x, y).
virtual REAL margin (UINT i) const
 Report the (unnormalized) margin of the example i.
REAL min_margin () const
 The minimal (unnormalized) in-sample margin.

Protected Member Functions

void set_dimensions (UINT, UINT)
void set_dimensions (const LearnModel &l)
void set_dimensions (const DataSet &d)
virtual bool serialize (std::ostream &, ver_list &) const
virtual bool unserialize (std::istream &, ver_list &, const id_t &=NIL_ID)

Protected Attributes

UINT _n_in
 input dimension of the model
UINT _n_out
 output dimension of the model
pDataSet ptd
 pointer to the training data set
pDataWgt ptw
 pointer to the sample weight (for training)
UINT n_samples
 equal to ptd->size()
FILE * logf
 file to record train/validate error

Detailed Description

A unified interface for learning models.

I try to provide + r_error and c_error for regression problems, r_error should be defined; for classification problems, c_error should be defined; these two errors can both be present

The training data is stored with the learning model (as a pointer) Say: why (the benefit of store with, a pointer); maybe not a pointer Say: what's the impact of doing this (what will be changed from normal implementation) Say: wgt: could be null if the model doesn't support ...otherwise shoud be a probability vector (randome_sample)...

The flowchart of the learning ...

  1. Create a new instance, load from a file, and/or reset an existing one lm->reset();.
  2. lm->set_train_data(sample_data);
    Specify the training data
  3. err = lm->train();
    Usually, the return value has no meaning
  4. y = (*lm)(x);
    Apply the learning model to new data.

Todo:
documentation

Do we really need two errors?

Definition at line 64 of file learnmodel.h.


Constructor & Destructor Documentation

LearnModel UINT  n_in = 0,
UINT  n_out = 0
 

Parameters:
n_in is the dimension of input.
n_out is the dimension of output.

Definition at line 70 of file learnmodel.cpp.


Member Function Documentation

REAL c_error const Output out,
const Output y
const [virtual]
 

Error measure for classification problems.

Parameters:
out is the output from the learned hypothesis.
y is the real output.
Returns:
Classification error between out and y. The error measure is not necessary symmetric. A commonly used measure is out != y.

Reimplemented in MultiClass_ECOC, and Ordinal_BLE.

Definition at line 112 of file learnmodel.cpp.

References INFINITESIMAL, and LearnModel::n_output().

Referenced by CGBoost::linear_weight(), AdaBoost::linear_weight(), lemga::lp_add_hypothesis(), LearnModel::test_c_error(), and LearnModel::train_c_error().

virtual LearnModel* clone  )  const [pure virtual]
 

Create a new object by replicating itself.

Returns:
A pointer to the new copy.
The code for a derived class Derived is always
 return new Derived(*this); 
Though seemingly redundant, it helps to copy an object without knowing the real type of the object.
See also:
C++ FAQ Lite 20.6

Implements Object.

Implemented in AdaBoost, AdaBoost_ECOC, AdaBoost_ERP, Aggregating, Bagging, Boosting, Cascade, CGBoost, CrossVal, vFoldCrossVal, HoldoutCrossVal, FeedForwardNN, LPBoost, MgnBoost, MultiClass_ECOC, NNLayer, Ordinal_BLE, Perceptron, Pulse, Stump, and SVM.

Referenced by CrossVal::add_model(), Aggregating::set_base_model(), and Ordinal_BLE::set_model().

virtual LearnModel* create  )  const [pure virtual]
 

Create a new object using the default constructor.

The code for a derived class Derived is always

 return new Derived(); 

Implements Object.

Implemented in AdaBoost, AdaBoost_ECOC, AdaBoost_ERP, Aggregating, Bagging, Boosting, Cascade, CGBoost, CrossVal, vFoldCrossVal, HoldoutCrossVal, FeedForwardNN, LPBoost, MgnBoost, MultiClass_ECOC, NNLayer, Ordinal_BLE, Perceptron, Pulse, Stump, and SVM.

bool exact_dimensions const DataSet d  )  const [inline]
 

Definition at line 179 of file learnmodel.h.

References LearnModel::exact_dimensions(), dataset::size(), dataset::x(), and dataset::y().

bool exact_dimensions const LearnModel l  )  const [inline]
 

Definition at line 177 of file learnmodel.h.

References LearnModel::exact_dimensions(), LearnModel::n_input(), and LearnModel::n_output().

bool exact_dimensions UINT  i,
UINT  o
const [inline]
 

Definition at line 175 of file learnmodel.h.

References LearnModel::valid_dimensions().

Referenced by LearnModel::exact_dimensions(), Boosting::get_output(), CGBoost::linear_weight(), AdaBoost::linear_weight(), Boosting::operator()(), Bagging::operator()(), LearnModel::set_dimensions(), and Aggregating::unserialize().

virtual Output get_output UINT  idx  )  const [inline, virtual]
 

Get the output of the hypothesis on the idx-th input.

Note:
It is possible to cache results to save computational effort.

Reimplemented in Boosting, CrossVal, and MultiClass_ECOC.

Definition at line 139 of file learnmodel.h.

References LearnModel::operator()(), LearnModel::ptd, and LearnModel::ptw.

Referenced by FeedForwardNN::cost(), lemga::op::inner_product(), CGBoost::linear_weight(), AdaBoost::linear_weight(), lemga::lp_add_hypothesis(), LearnModel::train_c_error(), and LearnModel::train_r_error().

virtual void initialize  )  [inline, virtual]
 

Reimplemented in FeedForwardNN, NNLayer, Perceptron, and SVM.

Definition at line 110 of file learnmodel.h.

virtual REAL margin UINT  i  )  const [inline, virtual]
 

Report the (unnormalized) margin of the example i.

Note:
It is possible to cache results to save computational effort.

Reimplemented in Boosting, CrossVal, and MultiClass_ECOC.

Definition at line 164 of file learnmodel.h.

References LearnModel::margin_of(), LearnModel::ptd, and LearnModel::ptw.

Referenced by LearnModel::min_margin().

virtual REAL margin_norm  )  const [inline, virtual]
 

The normalization term for margins.

The margin concept can be normalized or unnormalized. For example, for a perceptron model, the unnormalized margin would be the wegithed sum of the input features, and the normalized margin would be the distance to the hyperplane, and the normalization term is the norm of the hyperplane weight.

Since the normalization term is usually a constant, it would be more efficient if it is precomputed instead of being calculated every time when a margin is asked for. The best way is to use a cache. Here I use a easier way: let the users decide when to compute the normalization term.

Reimplemented in Bagging, Boosting, CrossVal, Perceptron, and SVM.

Definition at line 158 of file learnmodel.h.

REAL margin_of const Input x,
const Output y
const [virtual]
 

Report the (unnormalized) margin of an example (x, y).

Reimplemented in Bagging, Boosting, CrossVal, MultiClass_ECOC, Perceptron, and SVM.

Definition at line 199 of file learnmodel.cpp.

References OBJ_FUNC_UNDEFINED.

Referenced by LearnModel::margin().

REAL min_margin  )  const
 

The minimal (unnormalized) in-sample margin.

Definition at line 203 of file learnmodel.cpp.

References INFINITESIMAL, INFINITY, LearnModel::margin(), LearnModel::n_samples, and LearnModel::ptw.

UINT n_input  )  const [inline]
 

Definition at line 81 of file learnmodel.h.

References LearnModel::_n_in.

Referenced by FeedForwardNN::add_top(), NNLayer::back_propagate(), LearnModel::exact_dimensions(), NNLayer::feed_forward(), SVM::operator()(), Stump::operator()(), Pulse::operator()(), Perceptron::operator()(), FeedForwardNN::operator()(), LearnModel::set_dimensions(), Pulse::set_index(), SVM::signed_margin(), and LearnModel::valid_dimensions().

UINT n_output  )  const [inline]
 

Definition at line 82 of file learnmodel.h.

References LearnModel::_n_out.

Referenced by FeedForwardNN::_cost_deriv(), FeedForwardNN::add_top(), NNLayer::back_propagate(), Cascade::belief(), Ordinal_BLE::c_error(), MultiClass_ECOC::c_error(), LearnModel::c_error(), MultiClass_ECOC::distances(), LearnModel::exact_dimensions(), NNLayer::feed_forward(), NNLayer::operator()(), Ordinal_BLE::r_error(), LearnModel::r_error(), LearnModel::set_dimensions(), NNLayer::size(), and LearnModel::valid_dimensions().

virtual Output operator() const Input  )  const [pure virtual]
 

Implemented in Bagging, Boosting, Cascade, CrossVal, FeedForwardNN, MultiClass_ECOC, NNLayer, Ordinal_BLE, Perceptron, Pulse, Stump, and SVM.

Referenced by LearnModel::get_output().

REAL r_error const Output out,
const Output y
const [virtual]
 

Error measure for regression problems.

Parameters:
out is the output from the learned hypothesis.
y is the real output.
Returns:
Regression error between out and y. A commonly used measure is the squared error.

Reimplemented in Ordinal_BLE.

Definition at line 94 of file learnmodel.cpp.

References LearnModel::_n_out, and LearnModel::n_output().

Referenced by FeedForwardNN::_cost(), LearnModel::test_r_error(), and LearnModel::train_r_error().

void reset  )  [virtual]
 

Cleaning up the learning model but keeping most settings.

Note:
This is probably needed after training or loading from file, but before having another training.

Reimplemented in Aggregating, Boosting, CGBoost, CrossVal, MultiClass_ECOC, and Ordinal_BLE.

Definition at line 195 of file learnmodel.cpp.

References LearnModel::_n_in, and LearnModel::_n_out.

Referenced by Ordinal_BLE::reset(), CrossVal::reset(), and Aggregating::reset().

bool serialize std::ostream &  ,
ver_list
const [protected, virtual]
 

Reimplemented in Aggregating, Boosting, Cascade, CGBoost, CrossVal, vFoldCrossVal, HoldoutCrossVal, FeedForwardNN, MultiClass_ECOC, NNLayer, Ordinal_BLE, Perceptron, Pulse, Stump, and SVM.

Definition at line 74 of file learnmodel.cpp.

References LearnModel::_n_in, LearnModel::_n_out, and SERIALIZE_PARENT.

void set_dimensions const DataSet d  )  [inline, protected]
 

Definition at line 187 of file learnmodel.h.

References LearnModel::exact_dimensions(), LearnModel::set_dimensions(), dataset::x(), and dataset::y().

void set_dimensions const LearnModel l  )  [inline, protected]
 

Definition at line 185 of file learnmodel.h.

References LearnModel::n_input(), LearnModel::n_output(), and LearnModel::set_dimensions().

void set_dimensions UINT  ,
UINT 
[protected]
 

Definition at line 219 of file learnmodel.cpp.

References LearnModel::_n_in, LearnModel::_n_out, and LearnModel::valid_dimensions().

Referenced by CrossVal::add_model(), Perceptron::initialize(), MultiClass_ECOC::MultiClass_ECOC(), Ordinal_BLE::Ordinal_BLE(), LearnModel::set_dimensions(), Perceptron::set_weight(), _boost_gd::set_weight(), SVM::train(), Stump::train(), Pulse::train(), Perceptron::train(), Ordinal_BLE::train(), MultiClass_ECOC::train(), LPBoost::train(), CrossVal::train(), Boosting::train(), and Bagging::train().

void set_log_file FILE *  f  )  [inline]
 

Definition at line 84 of file learnmodel.h.

References LearnModel::logf.

void set_train_data const pDataSet pd,
const pDataWgt pw = 0
[virtual]
 

Set the data set and sample weight to be used in training.

If the learning model/algorithm can only do training using uniform sample weight, i.e., support_weighted_data() returns false, a ``boostrapped'' copy of the original data set will be generated and used in the following training. The boostrapping is done by randomly pick samples (with replacement) w.r.t. the given weight pw.

In order to make the life easier, when support_weighted_data() returns true, a null pw will be replaced by a uniformly distributed probability vector. So we have the following invariant

Invariant:
support_weighted_data() == (ptw != 0)
Parameters:
pd gives the data set.
pw gives the sample weight, whose default value is 0.
See also:
support_weighted_data(), train()

Reimplemented in Aggregating, Boosting, CrossVal, MultiClass_ECOC, and Ordinal_BLE.

Definition at line 165 of file learnmodel.cpp.

References EPSILON, LearnModel::n_samples, LearnModel::ptd, LearnModel::ptw, and LearnModel::support_weighted_data().

Referenced by Ordinal_BLE::set_train_data(), CrossVal::set_train_data(), Aggregating::set_train_data(), Bagging::train(), and AdaBoost_ECOC::train_with_full_partition().

virtual bool support_weighted_data  )  const [inline, virtual]
 

Whether the learning model/algorithm supports unequally weighted data.

Returns:
true if supporting; false otherwise. The default is false, just for safety.
See also:
set_train_data()

Reimplemented in Bagging, Boosting, Cascade, FeedForwardNN, MultiClass_ECOC, Ordinal_BLE, Perceptron, Pulse, Stump, and SVM.

Definition at line 94 of file learnmodel.h.

Referenced by LearnModel::set_train_data().

REAL test_c_error const pDataSet  )  const
 

Test error (classification).

Definition at line 142 of file learnmodel.cpp.

References LearnModel::c_error().

REAL test_r_error const pDataSet  )  const
 

Test error (regression).

Definition at line 134 of file learnmodel.cpp.

References LearnModel::r_error().

virtual void train  )  [pure virtual]
 

Train with preset data set and sample weight.

Implemented in AdaBoost, Bagging, Boosting, Cascade, CGBoost, CrossVal, FeedForwardNN, LPBoost, MgnBoost, MultiClass_ECOC, NNLayer, Ordinal_BLE, Perceptron, Pulse, Stump, and SVM.

Referenced by Bagging::train(), and AdaBoost_ECOC::train_with_full_partition().

REAL train_c_error  )  const
 

Training error (classification).

Definition at line 126 of file learnmodel.cpp.

References LearnModel::c_error(), LearnModel::get_output(), LearnModel::n_samples, LearnModel::ptd, and LearnModel::ptw.

Referenced by Perceptron::log_error(), and Boosting::train().

const pDataSet& train_data  )  const [inline]
 

Return pointer to the embedded training data set.

Definition at line 118 of file learnmodel.h.

References LearnModel::ptd.

Referenced by Boosting::get_output(), lemga::op::inner_product(), CGBoost::linear_weight(), AdaBoost::linear_weight(), and lemga::lp_add_hypothesis().

REAL train_r_error  )  const
 

Training error (regression).

Definition at line 118 of file learnmodel.cpp.

References LearnModel::get_output(), LearnModel::n_samples, LearnModel::ptd, LearnModel::ptw, and LearnModel::r_error().

bool unserialize std::istream &  ,
ver_list ,
const id_t = NIL_ID
[protected, virtual]
 

Reimplemented in Aggregating, Boosting, Cascade, CGBoost, CrossVal, vFoldCrossVal, HoldoutCrossVal, FeedForwardNN, MultiClass_ECOC, NNLayer, Ordinal_BLE, Perceptron, Pulse, Stump, and SVM.

Definition at line 80 of file learnmodel.cpp.

References LearnModel::_n_in, LearnModel::_n_out, LearnModel::n_samples, Object::NIL_ID, LearnModel::ptd, LearnModel::ptw, and UNSERIALIZE_PARENT.

bool valid_dimensions const LearnModel l  )  const [inline]
 

Definition at line 172 of file learnmodel.h.

References LearnModel::n_input(), LearnModel::n_output(), and LearnModel::valid_dimensions().

bool valid_dimensions UINT  ,
UINT 
const
 

Definition at line 214 of file learnmodel.cpp.

References LearnModel::_n_in, and LearnModel::_n_out.

Referenced by LearnModel::exact_dimensions(), Ordinal_BLE::operator()(), Aggregating::reset(), Aggregating::set_base_model(), LearnModel::set_dimensions(), CrossVal::unserialize(), Aggregating::unserialize(), and LearnModel::valid_dimensions().


Member Data Documentation

UINT _n_in [protected]
 

input dimension of the model

Definition at line 66 of file learnmodel.h.

Referenced by FeedForwardNN::add_top(), NNLayer::back_propagate(), NNLayer::feed_forward(), Perceptron::fld(), Perceptron::initialize(), LearnModel::n_input(), Perceptron::Perceptron(), LearnModel::reset(), SVM::serialize(), Perceptron::serialize(), NNLayer::serialize(), LearnModel::serialize(), LearnModel::set_dimensions(), NNLayer::set_weight(), SVM::train(), Stump::train(), Pulse::train(), SVM::unserialize(), Stump::unserialize(), Perceptron::unserialize(), NNLayer::unserialize(), LearnModel::unserialize(), FeedForwardNN::unserialize(), Aggregating::unserialize(), and LearnModel::valid_dimensions().

UINT _n_out [protected]
 

output dimension of the model

Definition at line 67 of file learnmodel.h.

Referenced by FeedForwardNN::_cost_deriv(), FeedForwardNN::add_top(), NNLayer::back_propagate(), NNLayer::feed_forward(), Boosting::get_output(), FeedForwardNN::gradient(), LearnModel::n_output(), Boosting::operator()(), Bagging::operator()(), LearnModel::r_error(), LearnModel::reset(), NNLayer::serialize(), LearnModel::serialize(), LearnModel::set_dimensions(), NNLayer::set_weight(), MultiClass_ECOC::train(), Stump::unserialize(), NNLayer::unserialize(), LearnModel::unserialize(), FeedForwardNN::unserialize(), Aggregating::unserialize(), and LearnModel::valid_dimensions().

FILE* logf [protected]
 

file to record train/validate error

Definition at line 72 of file learnmodel.h.

Referenced by FeedForwardNN::log_cost(), Perceptron::log_error(), and LearnModel::set_log_file().

UINT n_samples [protected]
 

equal to ptd->size()

Definition at line 70 of file learnmodel.h.

Referenced by Boosting::assign_weight(), Boosting::clear_cache(), AdaBoost_ECOC::confusion_matrix(), Boosting::cost(), Ordinal_BLE::extend_data(), Perceptron::fld(), CGBoost::linear_smpwgt(), AdaBoost::linear_smpwgt(), CGBoost::linear_weight(), AdaBoost::linear_weight(), LearnModel::min_margin(), Boosting::sample_weight(), Ordinal_BLE::set_train_data(), MultiClass_ECOC::set_train_data(), LearnModel::set_train_data(), AdaBoost_ECOC::setup_aux(), AdaBoost_ECOC::smpwgt_with_partition(), Stump::train(), Pulse::train(), LPBoost::train(), CGBoost::train(), Bagging::train(), AdaBoost::train(), LearnModel::train_c_error(), LearnModel::train_r_error(), AdaBoost_ERP::train_with_partial_partition(), AdaBoost_ERP::train_with_partition(), LearnModel::unserialize(), and Boosting::update_smpwgt().

pDataSet ptd [protected]
 

pointer to the training data set

Definition at line 68 of file learnmodel.h.

Referenced by MultiClass_ECOC::cost(), FeedForwardNN::cost(), Boosting::cost(), HoldoutCrossVal::cv_round(), vFoldCrossVal::cv_round(), Ordinal_BLE::extend_data(), Perceptron::fld(), LearnModel::get_output(), CrossVal::get_output(), Boosting::get_output(), FeedForwardNN::gradient(), Perceptron::initialize(), CGBoost::linear_smpwgt(), CGBoost::linear_weight(), AdaBoost::linear_weight(), LearnModel::margin(), CrossVal::margin(), Boosting::margin(), Perceptron::matrix(), Boosting::sample_weight(), Perceptron::set_data(), Ordinal_BLE::set_train_data(), MultiClass_ECOC::set_train_data(), LearnModel::set_train_data(), Boosting::set_train_data(), Aggregating::set_train_data(), Stump::train(), Pulse::train(), Perceptron::train(), Ordinal_BLE::train(), MultiClass_ECOC::train(), LPBoost::train(), FeedForwardNN::train(), CrossVal::train(), Boosting::train(), Bagging::train(), LearnModel::train_c_error(), LearnModel::train_data(), LearnModel::train_r_error(), AdaBoost_ECOC::train_with_full_partition(), AdaBoost_ERP::train_with_partial_partition(), MultiClass_ECOC::train_with_partition(), AdaBoost_ERP::train_with_partition(), AdaBoost_ECOC::train_with_partition(), Boosting::train_with_smpwgt(), Ordinal_BLE::unserialize(), and LearnModel::unserialize().

pDataWgt ptw [protected]
 

pointer to the sample weight (for training)

Definition at line 69 of file learnmodel.h.

Referenced by MultiClass_ECOC::cost(), FeedForwardNN::cost(), Boosting::cost(), Perceptron::fld(), MultiClass_ECOC::get_output(), LearnModel::get_output(), Boosting::get_output(), FeedForwardNN::gradient(), CGBoost::linear_smpwgt(), MultiClass_ECOC::margin(), LearnModel::margin(), LearnModel::min_margin(), Boosting::sample_weight(), LearnModel::set_train_data(), Aggregating::set_train_data(), AdaBoost_ECOC::setup_aux(), Stump::train(), Pulse::train(), Perceptron::train(), Ordinal_BLE::train(), MultiClass_ECOC::train(), LPBoost::train(), FeedForwardNN::train(), CrossVal::train(), CGBoost::train(), Boosting::train(), Bagging::train(), LearnModel::train_c_error(), LearnModel::train_r_error(), MultiClass_ECOC::train_with_partition(), AdaBoost_ERP::train_with_partition(), AdaBoost_ECOC::train_with_partition(), Boosting::train_with_smpwgt(), Ordinal_BLE::unserialize(), and LearnModel::unserialize().


The documentation for this class was generated from the following files:
Generated on Wed Nov 8 08:16:59 2006 for LEMGA by  doxygen 1.4.6