A set of $n$ points in Euclidean space is partitioned into the $k$ groups that minimize the within groups sum of squares. Under the assumption that the $n$ points come from independent sampling on a fixed distribution, conditions are found to assure asymptotic normality of the vector of means of the $k$ groups. The method of proof makes novel application of a functional central limit theorem for empirical processes--a generalization of Donsker's theorem due to Dudley.
Publié le : 1982-11-14
Classification:
$k$-means clustering,
central limit theorem,
minimized within cluster sum of squares,
differentiability in quadratic mean,
Donsker classes of functions,
functional central limit theorem for empirical processes,
62H30,
60F05,
60F17
@article{1176993713,
author = {Pollard, David},
title = {A Central Limit Theorem for $k$-Means Clustering},
journal = {Ann. Probab.},
volume = {10},
number = {4},
year = {1982},
pages = { 919-926},
language = {en},
url = {http://dml.mathdoc.fr/item/1176993713}
}
Pollard, David. A Central Limit Theorem for $k$-Means Clustering. Ann. Probab., Tome 10 (1982) no. 4, pp. 919-926. http://gdmltest.u-ga.fr/item/1176993713/