CrossVal Class Reference

A combination of cross-validation and model selection. More...

#include <crossval.h>

Inheritance diagram for CrossVal:

[legend]Collaboration diagram for CrossVal:


Public Member Functions
	CrossVal ()
	CrossVal (const CrossVal &)
const CrossVal &	operator= (const CrossVal &)
virtual CrossVal *	create () const =0
	Create a new object using the default constructor.
virtual CrossVal *	clone () const =0
	Create a new object by replicating itself.
void	add_model (const LearnModel &)
	add a candidate model to be cross-validated
UINT	size () const
	the number of candidate models under cross-validation
const LearnModel &	model (UINT n) const
	the n-th candidate model
UINT	rounds () const
	how many rounds of cross-validation?
void	set_rounds (UINT r)
	specifiy the number of rounds of cross-validation
bool	full_train () const
	train the best model on the full set?
void	set_full_train (bool f=true)
virtual void	set_train_data (const pDataSet &, const pDataWgt &=0)
	Set the data set and sample weight to be used in training.
virtual void	train ()
	Train with preset data set and sample weight.
virtual void	reset ()
virtual Output	operator() (const Input &x) const
virtual Output	get_output (UINT i) const
	Get the output of the hypothesis on the idx-th input.
virtual REAL	margin_norm () const
	The normalization term for margins.
virtual REAL	margin_of (const Input &x, const Output &y) const
	Report the (unnormalized) margin of an example (x, y).
virtual REAL	margin (UINT i) const
	Report the (unnormalized) margin of the example i.
REAL	error (UINT n) const
	the cross-validation error of the n-th candidate model
const LearnModel &	best_model () const
	the best model (trained if full_train() == true)
Protected Member Functions
virtual std::vector< REAL >	cv_round () const =0
	one round of the cross-validation operation
virtual bool	serialize (std::ostream &, ver_list &) const
virtual bool	unserialize (std::istream &, ver_list &, const id_t &=NIL_ID)
Protected Attributes
bool	fullset
	train the best model on the full set?
std::vector< pcLearnModel >	lm
	all candidate models
std::vector< REAL >	err
	cross-validation errors
UINT	n_rounds
	# of CV rounds, to beat the variance
pLearnModel	best_lm
	the best model (trained on the full set)
int	best

Detailed Description

A combination of cross-validation and model selection.

Note:: The interface is experimental. Say, it might be under LearnModel.

Definition at line 21 of file crossval.h.

Constructor & Destructor Documentation

CrossVal ( ) [inline]

Definition at line 33 of file crossval.h.

CrossVal ( const CrossVal & )

Definition at line 15 of file crossval.cpp.
References CrossVal::best, CrossVal::best_lm, and CrossVal::lm.

Member Function Documentation

void add_model ( const LearnModel & )

add a candidate model to be cross-validated

Definition at line 93 of file crossval.cpp.
References LearnModel::clone(), CrossVal::err, CrossVal::lm, and LearnModel::set_dimensions().

const LearnModel& best_model ( ) const [inline]

the best model (trained if full_train() == true)

Definition at line 79 of file crossval.h.
References CrossVal::best, CrossVal::best_lm, and CrossVal::lm.

virtual CrossVal* clone ( ) const [pure virtual]

Create a new object by replicating itself.

Returns:
A pointer to the new copy.
The code for a derived class Derived is always
return new Derived(*this);
Though seemingly redundant, it helps to copy an object without knowing the real type of the object.
See also:
C++ FAQ Lite 20.6

Implements LearnModel.
Implemented in vFoldCrossVal, and HoldoutCrossVal.

virtual CrossVal* create ( ) const [pure virtual]

Create a new object using the default constructor.
The code for a derived class Derived is always
return new Derived();

Implements LearnModel.
Implemented in vFoldCrossVal, and HoldoutCrossVal.

virtual std::vector<REAL> cv_round ( ) const [protected, pure virtual]

one round of the cross-validation operation

Implemented in vFoldCrossVal, and HoldoutCrossVal.
Referenced by CrossVal::train().

REAL error ( UINT n ) const [inline]

the cross-validation error of the n-th candidate model

Definition at line 76 of file crossval.h.
References CrossVal::err, and CrossVal::size().

bool full_train ( ) const [inline]

train the best model on the full set?

Definition at line 53 of file crossval.h.
References CrossVal::fullset.

virtual Output get_output ( UINT i ) const [inline, virtual]

Get the output of the hypothesis on the idx-th input.

Note:
It is possible to cache results to save computational effort.

Reimplemented from LearnModel.
Definition at line 62 of file crossval.h.
References CrossVal::best, CrossVal::best_lm, and LearnModel::ptd.

virtual REAL margin ( UINT i ) const [inline, virtual]

Report the (unnormalized) margin of the example i.

Note:
It is possible to cache results to save computational effort.

Reimplemented from LearnModel.
Definition at line 71 of file crossval.h.
References CrossVal::best, CrossVal::best_lm, and LearnModel::ptd.

virtual REAL margin_norm ( ) const [inline, virtual]

The normalization term for margins.
The margin concept can be normalized or unnormalized. For example, for a perceptron model, the unnormalized margin would be the wegithed sum of the input features, and the normalized margin would be the distance to the hyperplane, and the normalization term is the norm of the hyperplane weight.
Since the normalization term is usually a constant, it would be more efficient if it is precomputed instead of being calculated every time when a margin is asked for. The best way is to use a cache. Here I use a easier way: let the users decide when to compute the normalization term.
Reimplemented from LearnModel.
Definition at line 65 of file crossval.h.
References CrossVal::best, and CrossVal::best_lm.

virtual REAL margin_of ( const Input & x,

const Output & y

) const [inline, virtual]

Report the (unnormalized) margin of an example (x, y).

Reimplemented from LearnModel.
Definition at line 68 of file crossval.h.
References CrossVal::best, and CrossVal::best_lm.

const LearnModel& model ( UINT n ) const [inline]

the n-th candidate model

Definition at line 45 of file crossval.h.
References CrossVal::lm, and CrossVal::size().

virtual Output operator() ( const Input & x ) const [inline, virtual]

Implements LearnModel.
Definition at line 59 of file crossval.h.
References CrossVal::best, and CrossVal::best_lm.

const CrossVal & operator= ( const CrossVal & )

Definition at line 26 of file crossval.cpp.
References CrossVal::best, CrossVal::best_lm, CrossVal::err, CrossVal::fullset, CrossVal::lm, and CrossVal::n_rounds.

void reset ( ) [virtual]

Cleaning up the learning model but keeping most settings.
Note:
This is probably needed after training or loading from file, but before having another training.

Reimplemented from LearnModel.
Definition at line 128 of file crossval.cpp.
References CrossVal::best, CrossVal::best_lm, CrossVal::err, and LearnModel::reset().

UINT rounds ( ) const [inline]

how many rounds of cross-validation?

Definition at line 49 of file crossval.h.
References CrossVal::n_rounds.

bool serialize ( std::ostream & ,

ver_list &

) const [protected, virtual]

Reimplemented from LearnModel.
Reimplemented in vFoldCrossVal, and HoldoutCrossVal.
Definition at line 45 of file crossval.cpp.
References CrossVal::best, CrossVal::best_lm, CrossVal::err, CrossVal::lm, SERIALIZE_PARENT, and CrossVal::size().

void set_full_train ( bool f = true ) [inline]

Definition at line 54 of file crossval.h.
References CrossVal::fullset.

void set_rounds ( UINT r ) [inline]

specifiy the number of rounds of cross-validation

Definition at line 51 of file crossval.h.
References CrossVal::n_rounds.
Referenced by vFoldCrossVal::set_folds(), and HoldoutCrossVal::set_holdout().

void set_train_data ( const pDataSet & pd,

const pDataWgt & pw = 0

) [virtual]

Set the data set and sample weight to be used in training.
If the learning model/algorithm can only do training using uniform sample weight, i.e., support_weighted_data() returns false, a ``boostrapped'' copy of the original data set will be generated and used in the following training. The boostrapping is done by randomly pick samples (with replacement) w.r.t. the given weight pw.
In order to make the life easier, when support_weighted_data() returns true, a null pw will be replaced by a uniformly distributed probability vector. So we have the following invariant
Invariant:
support_weighted_data() == (ptw != 0)

Parameters:

pd gives the data set.

pw gives the sample weight, whose default value is 0.

See also:
support_weighted_data(), train()

Reimplemented from LearnModel.
Definition at line 99 of file crossval.cpp.
References CrossVal::best, CrossVal::best_lm, CrossVal::lm, and LearnModel::set_train_data().
Referenced by vFoldCrossVal::cv_round().

UINT size ( ) const [inline]

the number of candidate models under cross-validation

Definition at line 43 of file crossval.h.
References CrossVal::err, and CrossVal::lm.
Referenced by vFoldCrossVal::cv_round(), CrossVal::error(), CrossVal::model(), and CrossVal::serialize().

void train ( ) [virtual]

Train with preset data set and sample weight.

Implements LearnModel.
Definition at line 108 of file crossval.cpp.
References CrossVal::best, CrossVal::best_lm, CrossVal::cv_round(), CrossVal::err, CrossVal::fullset, CrossVal::lm, CrossVal::n_rounds, LearnModel::ptd, LearnModel::ptw, and LearnModel::set_dimensions().

bool unserialize ( std::istream & ,

ver_list & ,

const id_t & = NIL_ID

) [protected, virtual]

Reimplemented from LearnModel.
Reimplemented in vFoldCrossVal, and HoldoutCrossVal.
Definition at line 63 of file crossval.cpp.
References CrossVal::best, Object::create(), CrossVal::err, CrossVal::lm, Object::NIL_ID, UNSERIALIZE_PARENT, and LearnModel::valid_dimensions().

Member Data Documentation

int best [protected]

best_lm was actually lm[best]
Note:
Before cross-validation, best is -1. After, lm[best] is the best model. If full-training is required, best_lm is then assigned.

Definition at line 28 of file crossval.h.
Referenced by CrossVal::best_model(), CrossVal::CrossVal(), CrossVal::get_output(), CrossVal::margin(), CrossVal::margin_norm(), CrossVal::margin_of(), CrossVal::operator()(), CrossVal::operator=(), CrossVal::reset(), CrossVal::serialize(), CrossVal::set_train_data(), CrossVal::train(), and CrossVal::unserialize().

pLearnModel best_lm [protected]

the best model (trained on the full set)

Definition at line 27 of file crossval.h.
Referenced by CrossVal::best_model(), CrossVal::CrossVal(), CrossVal::get_output(), CrossVal::margin(), CrossVal::margin_norm(), CrossVal::margin_of(), CrossVal::operator()(), CrossVal::operator=(), CrossVal::reset(), CrossVal::serialize(), CrossVal::set_train_data(), and CrossVal::train().

std::vector<REAL> err [protected]

cross-validation errors

Definition at line 25 of file crossval.h.
Referenced by CrossVal::add_model(), CrossVal::error(), CrossVal::operator=(), CrossVal::reset(), CrossVal::serialize(), CrossVal::size(), CrossVal::train(), and CrossVal::unserialize().

bool fullset [protected]

train the best model on the full set?

Definition at line 23 of file crossval.h.
Referenced by CrossVal::full_train(), CrossVal::operator=(), CrossVal::set_full_train(), and CrossVal::train().

std::vector<pcLearnModel> lm [protected]

all candidate models

Definition at line 24 of file crossval.h.
Referenced by CrossVal::add_model(), CrossVal::best_model(), CrossVal::CrossVal(), vFoldCrossVal::cv_round(), CrossVal::model(), CrossVal::operator=(), CrossVal::serialize(), CrossVal::set_train_data(), CrossVal::size(), CrossVal::train(), and CrossVal::unserialize().

UINT n_rounds [protected]

# of CV rounds, to beat the variance

Definition at line 26 of file crossval.h.
Referenced by CrossVal::operator=(), CrossVal::rounds(), CrossVal::set_rounds(), and CrossVal::train().

The documentation for this class was generated from the following files:

Generated on Wed Nov 8 08:16:50 2006 for LEMGA by

1.4.6

CrossVal Class Reference

Public Member Functions

Protected Member Functions

Protected Attributes

Detailed Description

Constructor & Destructor Documentation

Member Function Documentation

Member Data Documentation