When studying the training error and the prediction error for
boosting, it is often assumed that the hypotheses returned by the base learner
are weakly accurate, or are able to beat a random guesser by a certain amount
of difference. It has been an open question how much this difference can be,
whether it will eventually disappear in the boosting process or be bounded by a
positive amount. This question is crucial for the behavior of both the training
error and the prediction error. In this paper we study this problem and show
affirmatively that the amount of improvement over the random guesser will be at
least a positive amount for almost all possible sample realizations and for
most of the commonly used base hypotheses. This has a number of implications
for the prediction error, including, for example, that boosting forever may not
be good and regularization may be necessary. The problem is studied by first
considering an analog of AdaBoost in regression, where we study similar
properties and find that, for good performance, one cannot hope to avoid
regularization by just adopting the boosting device to regression.
Publié le : 2002-02-14
Classification:
Angular span,
boosting,
classification,
error bounds,
least squares regression,
matching pursuit,
nearest neighbor rule,
overfit,
prediction error,
regularization,
training error,
weak hypotheses,
62G99,
68T99
@article{1015362184,
author = {Jiang, Wenxin},
title = {On weak base Hypotheses and their implications for boosting
regression and classification},
journal = {Ann. Statist.},
volume = {30},
number = {1},
year = {2002},
pages = { 51-73},
language = {en},
url = {http://dml.mathdoc.fr/item/1015362184}
}
Jiang, Wenxin. On weak base Hypotheses and their implications for boosting
regression and classification. Ann. Statist., Tome 30 (2002) no. 1, pp. 51-73. http://gdmltest.u-ga.fr/item/1015362184/