A digitized handwritten numeral can be represented as a binary or
greyscale image. An important pattern recognition task that has received much
attention lately is to automatically determine the digit, given the image.
¶ While many different techniques have been pushed very hard to solve
this task, the most successful and intuitively appropriate is due to Simard, Le
Cun and Denker (1993). Their approach combined nearest-neighbor classification
with a subject-specific invariant metric that allows for small rotations,
translations and other natural transformations. We report on Simard's
classifier and compare it to other approaches. One important negative aspect of
near-neighbor classification is that all the work gets done at lookup time, and
with around 10,000 training images in high dimensions this can be
exorbitant.
¶ In this paper we develop rich models for representing large subsets
of the prototypes. One example is a low-dimensional hyperplane defined by a
point and a set of basis or tangent vectors. The components of these models are
learned from the training set, chosen to minimize the average tangent distance
from a subset of the training images--as such they are similar in flavor to
the singular value decomposition (SVD), which finds closest hyperplanes in
Euclidean distance. These models are either used singly per class or used as
basic building blocks in conjunction with the $K$-means clustering
algorithm.