Processing math: 0%
Two-Sample Tests for Multivariate Distributions
Weiss, Lionel
Ann. Math. Statist., Tome 31 (1960) no. 4, p. 159-164 / Harvested from Project Euclid
X(1), X(2), \cdots, X(m), Y(1), Y(2), \cdots, Y(n) are independent k-variate random variables. The distribution of X(i) has pdf f(x), say, where x denotes a k-dimensional vector throughout this paper, and the distribution of Y(j) has pdf g(x), say. We assume that f(x) and g(x) are piecewise continuous, and that each has a finite upper bound, which it is not necessary to specify. Denote by 2R_i the distance from X(i) to the nearest of the points X(1), \cdots, X(i - 1), X(i + 1), \cdots, X(m), and denote by S_i the number of points Y(1), \cdots, Y(n) contained in the open sphere \{x: | x - X(i) | < R_i\}. Clearly, the joint distribution of S_i, S_j is the same as the joint distribution of S_{i'}, S_{j'}, for any subscripts with i \neq j, i' \neq j'. Let r be a non-negative integer, and \alpha any fixed positive value. Q(r) denotes the Lebesgue integral \int_{E_k} \frac{2^k \alpha f^2 (x)\lbrack g(x) \rbrack^r}{\lbrack g(x) + 2^k\alpha f(x) \rbrack^{r + 1}} dx, where E_k denotes Euclidean k-space. We will show that \lim_{m \rightarrow \infty, m/n = \alpha} P_{m, n}\lbrack S_1 = s_1, S_2 = s_2\rbrack = Q(s_1)Q(s_2), for any non-negative integers s_1,s_2, the approach being uniform in s_1,s_2. Thus, in the limit S_1, S_2 are independently distributed, with \lim_{m \rightarrow \infty, m/n = \alpha} P_{m, n}\lbrack S_1 = s_1\rbrack = Q(s_1). In [1], which discussed the univariate case, S_i was defined as the number of Y's closer to X(i) than to any other X to their right. In the present paper, S_i is defined as the number of Y's in another neighborhood of X(i). Our present definition of S_i does not become for k = 1 the same as the definition of S_i in [1]. Rather, in the univariate case, our present definition of S_i is the number of Y's lying within a distance R_i on either side of X(i). However, if \lim_{m \rightarrow \infty, m/n = \alpha} P_{m, n}\lbrack S_1, = s_1, S_2 = s_2\rbrack is computed for the univariate case using the definition of S_i given in [1], the only way in which it differs from Q(s_1)Q(s_2) is that \alpha is replaced by \alpha/2. Thus it seems reasonable to treat the S_i as defined here as k-dimensional analogues of the S_i as defined in [1], at least for large samples. An intuitive reason for \alpha being replaced by \alpha/2 is that in our present case, \sum^m_{i = 1} S_i may be less than n, whereas in [1] this sum must always equal n. Thus in our present case, we are in a sense discarding some of the Y's, which lowers n relative to m and thus raises \alpha by a certain factor (2, as it happens). In our present case, \sum S_i may be less than n because the R_i are chosen to make the spheres around the X's non-overlapping, thus simplifying the analysis. The R_i were chosen to give the largest possible non-overlapping spheres because it would seem intuitively that the larger the spheres, the more rapid the approach of the probabilities to their limiting values.
Publié le : 1960-03-14
Classification: 
@article{1177705995,
     author = {Weiss, Lionel},
     title = {Two-Sample Tests for Multivariate Distributions},
     journal = {Ann. Math. Statist.},
     volume = {31},
     number = {4},
     year = {1960},
     pages = { 159-164},
     language = {en},
     url = {http://dml.mathdoc.fr/item/1177705995}
}
Weiss, Lionel. Two-Sample Tests for Multivariate Distributions. Ann. Math. Statist., Tome 31 (1960) no. 4, pp.  159-164. http://gdmltest.u-ga.fr/item/1177705995/