Finite sample approximation results for principal component analysis: A matrix perturbation approach
Nadler, Boaz
Ann. Statist., Tome 36 (2008) no. 1, p. 2791-2817 / Harvested from Project Euclid
Principal component analysis (PCA) is a standard tool for dimensional reduction of a set of n observations (samples), each with p variables. In this paper, using a matrix perturbation approach, we study the nonasymptotic relation between the eigenvalues and eigenvectors of PCA computed on a finite sample of size n, and those of the limiting population PCA as n→∞. As in machine learning, we present a finite sample theorem which holds with high probability for the closeness between the leading eigenvalue and eigenvector of sample PCA and population PCA under a spiked covariance model. In addition, we also consider the relation between finite sample PCA and the asymptotic results in the joint limit p, n→∞, with p/n=c. We present a matrix perturbation view of the “phase transition phenomenon,” and a simple linear-algebra based derivation of the eigenvalue and eigenvector overlap in this asymptotic limit. Moreover, our analysis also applies for finite p, n where we show that although there is no sharp phase transition as in the infinite case, either as a function of noise level or as a function of sample size n, the eigenvector of sample PCA may exhibit a sharp “loss of tracking,” suddenly losing its relation to the (true) eigenvector of the population PCA matrix. This occurs due to a crossover between the eigenvalue due to the signal and the largest eigenvalue due to noise, whose eigenvector points in a random direction.
Publié le : 2008-12-15
Classification:  Principal component analysis,  spiked covariance model,  random matrix theory,  matrix perturbation,  phase transition,  62H25,  62E17,  15A42
@article{1231165185,
     author = {Nadler, Boaz},
     title = {Finite sample approximation results for principal component analysis: A matrix perturbation approach},
     journal = {Ann. Statist.},
     volume = {36},
     number = {1},
     year = {2008},
     pages = { 2791-2817},
     language = {en},
     url = {http://dml.mathdoc.fr/item/1231165185}
}
Nadler, Boaz. Finite sample approximation results for principal component analysis: A matrix perturbation approach. Ann. Statist., Tome 36 (2008) no. 1, pp.  2791-2817. http://gdmltest.u-ga.fr/item/1231165185/