A Multivariate Two-Sample Test Based on the Number of Nearest Neighbor Type Coincidences
Henze, Norbert
Ann. Statist., Tome 16 (1988) no. 1, p. 772-783 / Harvested from Project Euclid
For independent $d$-variate random samples $X_1, \cdots, X_{n_1}$ i.i.d. $f(x), Y_1, \cdots, Y_{n_2}$ i.i.d. $g(x)$, where the densities $f$ and $g$ are assumed to be continuous a.e., consider the number $T$ of all $k$ nearest neighbor comparisons in which observations and their neighbors belong to the same sample. We show that, if $f = g$ a.e., the limiting (normal) distribution of $T$, as $\min(n_1, n_2) \rightarrow \infty, n_1/(n_1 + n_2) \rightarrow \tau, 0 < \tau < 1$, does not depend on $f$. An omnibus procedure for testing the hypothesis $H_0: f = g$ a.e. is obtained by rejecting $H_0$ for large values of $T$. The result applies to a general distance (generated by a norm on $\mathbb{R}^d$) for determining nearest neighbors, and it generalizes to the multisample situation.
Publié le : 1988-06-14
Classification:  Multivariate two-sample test,  nearest neighbor-type coincidences,  62H15,  62G10
@article{1176350835,
     author = {Henze, Norbert},
     title = {A Multivariate Two-Sample Test Based on the Number of Nearest Neighbor Type Coincidences},
     journal = {Ann. Statist.},
     volume = {16},
     number = {1},
     year = {1988},
     pages = { 772-783},
     language = {en},
     url = {http://dml.mathdoc.fr/item/1176350835}
}
Henze, Norbert. A Multivariate Two-Sample Test Based on the Number of Nearest Neighbor Type Coincidences. Ann. Statist., Tome 16 (1988) no. 1, pp.  772-783. http://gdmltest.u-ga.fr/item/1176350835/