CrossVal Class Reference

A combination of cross-validation and model selection. More...

#include <crossval.h>

Inheritance diagram for CrossVal:

Inheritance graph
[legend]
Collaboration diagram for CrossVal:

Collaboration graph
[legend]
List of all members.

Public Member Functions

 CrossVal ()
 CrossVal (const CrossVal &)
const CrossValoperator= (const CrossVal &)
virtual CrossValcreate () const =0
 Create a new object using the default constructor.
virtual CrossValclone () const =0
 Create a new object by replicating itself.
void add_model (const LearnModel &)
 add a candidate model to be cross-validated
UINT size () const
 the number of candidate models under cross-validation
const LearnModelmodel (UINT n) const
 the n-th candidate model
UINT rounds () const
 how many rounds of cross-validation?
void set_rounds (UINT r)
 specifiy the number of rounds of cross-validation
bool full_train () const
 train the best model on the full set?
void set_full_train (bool f=true)
virtual void set_train_data (const pDataSet &, const pDataWgt &=0)
 Set the data set and sample weight to be used in training.
virtual void train ()
 Train with preset data set and sample weight.
virtual void reset ()
virtual Output operator() (const Input &x) const
virtual Output get_output (UINT i) const
 Get the output of the hypothesis on the idx-th input.
virtual REAL margin_norm () const
 The normalization term for margins.
virtual REAL margin_of (const Input &x, const Output &y) const
 Report the (unnormalized) margin of an example (x, y).
virtual REAL margin (UINT i) const
 Report the (unnormalized) margin of the example i.
REAL error (UINT n) const
 the cross-validation error of the n-th candidate model
const LearnModelbest_model () const
 the best model (trained if full_train() == true)

Protected Member Functions

virtual std::vector< REALcv_round () const =0
 one round of the cross-validation operation
virtual bool serialize (std::ostream &, ver_list &) const
virtual bool unserialize (std::istream &, ver_list &, const id_t &=NIL_ID)

Protected Attributes

bool fullset
 train the best model on the full set?
std::vector< pcLearnModellm
 all candidate models
std::vector< REALerr
 cross-validation errors
UINT n_rounds
 # of CV rounds, to beat the variance
pLearnModel best_lm
 the best model (trained on the full set)
int best

Detailed Description

A combination of cross-validation and model selection.

Note:
The interface is experimental. Say, it might be under LearnModel.

Definition at line 21 of file crossval.h.


Constructor & Destructor Documentation

CrossVal  )  [inline]
 

Definition at line 33 of file crossval.h.

CrossVal const CrossVal  ) 
 

Definition at line 15 of file crossval.cpp.

References CrossVal::best, CrossVal::best_lm, and CrossVal::lm.


Member Function Documentation

void add_model const LearnModel  ) 
 

add a candidate model to be cross-validated

Definition at line 93 of file crossval.cpp.

References LearnModel::clone(), CrossVal::err, CrossVal::lm, and LearnModel::set_dimensions().

const LearnModel& best_model  )  const [inline]
 

the best model (trained if full_train() == true)

Definition at line 79 of file crossval.h.

References CrossVal::best, CrossVal::best_lm, and CrossVal::lm.

virtual CrossVal* clone  )  const [pure virtual]
 

Create a new object by replicating itself.

Returns:
A pointer to the new copy.
The code for a derived class Derived is always
 return new Derived(*this); 
Though seemingly redundant, it helps to copy an object without knowing the real type of the object.
See also:
C++ FAQ Lite 20.6

Implements LearnModel.

Implemented in vFoldCrossVal, and HoldoutCrossVal.

virtual CrossVal* create  )  const [pure virtual]
 

Create a new object using the default constructor.

The code for a derived class Derived is always

 return new Derived(); 

Implements LearnModel.

Implemented in vFoldCrossVal, and HoldoutCrossVal.

virtual std::vector<REAL> cv_round  )  const [protected, pure virtual]
 

one round of the cross-validation operation

Implemented in vFoldCrossVal, and HoldoutCrossVal.

Referenced by CrossVal::train().

REAL error UINT  n  )  const [inline]
 

the cross-validation error of the n-th candidate model

Definition at line 76 of file crossval.h.

References CrossVal::err, and CrossVal::size().

bool full_train  )  const [inline]
 

train the best model on the full set?

Definition at line 53 of file crossval.h.

References CrossVal::fullset.

virtual Output get_output UINT  i  )  const [inline, virtual]
 

Get the output of the hypothesis on the idx-th input.

Note:
It is possible to cache results to save computational effort.

Reimplemented from LearnModel.

Definition at line 62 of file crossval.h.

References CrossVal::best, CrossVal::best_lm, and LearnModel::ptd.

virtual REAL margin UINT  i  )  const [inline, virtual]
 

Report the (unnormalized) margin of the example i.

Note:
It is possible to cache results to save computational effort.

Reimplemented from LearnModel.

Definition at line 71 of file crossval.h.

References CrossVal::best, CrossVal::best_lm, and LearnModel::ptd.

virtual REAL margin_norm  )  const [inline, virtual]
 

The normalization term for margins.

The margin concept can be normalized or unnormalized. For example, for a perceptron model, the unnormalized margin would be the wegithed sum of the input features, and the normalized margin would be the distance to the hyperplane, and the normalization term is the norm of the hyperplane weight.

Since the normalization term is usually a constant, it would be more efficient if it is precomputed instead of being calculated every time when a margin is asked for. The best way is to use a cache. Here I use a easier way: let the users decide when to compute the normalization term.

Reimplemented from LearnModel.

Definition at line 65 of file crossval.h.

References CrossVal::best, and CrossVal::best_lm.

virtual REAL margin_of const Input x,
const Output y
const [inline, virtual]
 

Report the (unnormalized) margin of an example (x, y).

Reimplemented from LearnModel.

Definition at line 68 of file crossval.h.

References CrossVal::best, and CrossVal::best_lm.

const LearnModel& model UINT  n  )  const [inline]
 

the n-th candidate model

Definition at line 45 of file crossval.h.

References CrossVal::lm, and CrossVal::size().

virtual Output operator() const Input x  )  const [inline, virtual]
 

Implements LearnModel.

Definition at line 59 of file crossval.h.

References CrossVal::best, and CrossVal::best_lm.

const CrossVal & operator= const CrossVal  ) 
 

Definition at line 26 of file crossval.cpp.

References CrossVal::best, CrossVal::best_lm, CrossVal::err, CrossVal::fullset, CrossVal::lm, and CrossVal::n_rounds.

void reset  )  [virtual]
 

Cleaning up the learning model but keeping most settings.

Note:
This is probably needed after training or loading from file, but before having another training.

Reimplemented from LearnModel.

Definition at line 128 of file crossval.cpp.

References CrossVal::best, CrossVal::best_lm, CrossVal::err, and LearnModel::reset().

UINT rounds  )  const [inline]
 

how many rounds of cross-validation?

Definition at line 49 of file crossval.h.

References CrossVal::n_rounds.

bool serialize std::ostream &  ,
ver_list
const [protected, virtual]
 

Reimplemented from LearnModel.

Reimplemented in vFoldCrossVal, and HoldoutCrossVal.

Definition at line 45 of file crossval.cpp.

References CrossVal::best, CrossVal::best_lm, CrossVal::err, CrossVal::lm, SERIALIZE_PARENT, and CrossVal::size().

void set_full_train bool  f = true  )  [inline]
 

Definition at line 54 of file crossval.h.

References CrossVal::fullset.

void set_rounds UINT  r  )  [inline]
 

specifiy the number of rounds of cross-validation

Definition at line 51 of file crossval.h.

References CrossVal::n_rounds.

Referenced by vFoldCrossVal::set_folds(), and HoldoutCrossVal::set_holdout().

void set_train_data const pDataSet pd,
const pDataWgt pw = 0
[virtual]
 

Set the data set and sample weight to be used in training.

If the learning model/algorithm can only do training using uniform sample weight, i.e., support_weighted_data() returns false, a ``boostrapped'' copy of the original data set will be generated and used in the following training. The boostrapping is done by randomly pick samples (with replacement) w.r.t. the given weight pw.

In order to make the life easier, when support_weighted_data() returns true, a null pw will be replaced by a uniformly distributed probability vector. So we have the following invariant

Invariant:
support_weighted_data() == (ptw != 0)
Parameters:
pd gives the data set.
pw gives the sample weight, whose default value is 0.
See also:
support_weighted_data(), train()

Reimplemented from LearnModel.

Definition at line 99 of file crossval.cpp.

References CrossVal::best, CrossVal::best_lm, CrossVal::lm, and LearnModel::set_train_data().

Referenced by vFoldCrossVal::cv_round().

UINT size  )  const [inline]
 

the number of candidate models under cross-validation

Definition at line 43 of file crossval.h.

References CrossVal::err, and CrossVal::lm.

Referenced by vFoldCrossVal::cv_round(), CrossVal::error(), CrossVal::model(), and CrossVal::serialize().

void train  )  [virtual]
 

Train with preset data set and sample weight.

Implements LearnModel.

Definition at line 108 of file crossval.cpp.

References CrossVal::best, CrossVal::best_lm, CrossVal::cv_round(), CrossVal::err, CrossVal::fullset, CrossVal::lm, CrossVal::n_rounds, LearnModel::ptd, LearnModel::ptw, and LearnModel::set_dimensions().

bool unserialize std::istream &  ,
ver_list ,
const id_t = NIL_ID
[protected, virtual]
 

Reimplemented from LearnModel.

Reimplemented in vFoldCrossVal, and HoldoutCrossVal.

Definition at line 63 of file crossval.cpp.

References CrossVal::best, Object::create(), CrossVal::err, CrossVal::lm, Object::NIL_ID, UNSERIALIZE_PARENT, and LearnModel::valid_dimensions().


Member Data Documentation

int best [protected]
 

best_lm was actually lm[best]

Note:
Before cross-validation, best is -1. After, lm[best] is the best model. If full-training is required, best_lm is then assigned.

Definition at line 28 of file crossval.h.

Referenced by CrossVal::best_model(), CrossVal::CrossVal(), CrossVal::get_output(), CrossVal::margin(), CrossVal::margin_norm(), CrossVal::margin_of(), CrossVal::operator()(), CrossVal::operator=(), CrossVal::reset(), CrossVal::serialize(), CrossVal::set_train_data(), CrossVal::train(), and CrossVal::unserialize().

pLearnModel best_lm [protected]
 

the best model (trained on the full set)

Definition at line 27 of file crossval.h.

Referenced by CrossVal::best_model(), CrossVal::CrossVal(), CrossVal::get_output(), CrossVal::margin(), CrossVal::margin_norm(), CrossVal::margin_of(), CrossVal::operator()(), CrossVal::operator=(), CrossVal::reset(), CrossVal::serialize(), CrossVal::set_train_data(), and CrossVal::train().

std::vector<REAL> err [protected]
 

cross-validation errors

Definition at line 25 of file crossval.h.

Referenced by CrossVal::add_model(), CrossVal::error(), CrossVal::operator=(), CrossVal::reset(), CrossVal::serialize(), CrossVal::size(), CrossVal::train(), and CrossVal::unserialize().

bool fullset [protected]
 

train the best model on the full set?

Definition at line 23 of file crossval.h.

Referenced by CrossVal::full_train(), CrossVal::operator=(), CrossVal::set_full_train(), and CrossVal::train().

std::vector<pcLearnModel> lm [protected]
 

all candidate models

Definition at line 24 of file crossval.h.

Referenced by CrossVal::add_model(), CrossVal::best_model(), CrossVal::CrossVal(), vFoldCrossVal::cv_round(), CrossVal::model(), CrossVal::operator=(), CrossVal::serialize(), CrossVal::set_train_data(), CrossVal::size(), CrossVal::train(), and CrossVal::unserialize().

UINT n_rounds [protected]
 

# of CV rounds, to beat the variance

Definition at line 26 of file crossval.h.

Referenced by CrossVal::operator=(), CrossVal::rounds(), CrossVal::set_rounds(), and CrossVal::train().


The documentation for this class was generated from the following files:
Generated on Wed Nov 8 08:16:50 2006 for LEMGA by  doxygen 1.4.6