Given $n$ independent replicates of a jointly distributed pair $(X,
Y) \in \mathscr{R}^d \times \mathscr{R}$, we wish to select from a fixed
sequence of model classes $\mathscr{F}_1, \mathscr{F}_2,\dots$ a deterministic
prediction rule $f: \mathscr{R}^d \to \mathscr{R}$ whose risk is small.We
investigate the possibility of empirically assessing the complexity of each
model class, that is, the actual difficulty of the estimation problem within
each class. The estimated complexities are in turn used to define an adaptive
model selection procedure, which is based on complexity penalized empirical
risk.
¶ The available data are divided into two parts. The first is used to
form an empirical cover of each model class, and the second is used to select a
candidate rule from each cover based on empirical risk. The covering radii are
determined empirically to optimize a tight upper bound on the estimation error.
An estimate is chosen from the list of candidates in order to minimize the sum
of class complexity and empirical risk. A distinguishing feature of the
approach is that the complexity of each model class is assessed empirically,
based on the size of its empirical cover.
¶ Finite sample performance bounds are established for the estimates,
and these bounds are applied to several nonparametric estimation problems. The
estimates are shown to achieve a favorable trade-off between approximation and
estimation error and to perform as well as if the distribution-dependent
complexities of the model classes were known beforehand. In addition, it is
shown that the estimate can be consistent, and even possess near optimal rates
of convergence, when each model class has an infinite VC or pseudo dimension.
¶ For regression estimation with squared loss we modify our estimate
to achieve a faster rate of convergence.