Suppose we wish to construct a variable $k$-cell histogram based on an independent identically distributed sample of size $n - 1$ from an unknown density $f$ on the interval of finite length. A variable cell histogram requires cutpoints and heights of all of its cells to be specified. We propose the following procedure: (i) choose from the order statistics corresponding to the sample a set of $k + 1$ cutpoints that maximize a criterion, a function of the sample spacings; (ii) compute heights of the $k$ cells according to a formula. The resulting histogram estimates a $k$-cell theoretical histogram that stays constant within a cell and that minimizes the Hellinger distance to the density $f$. The histogram tends to estimate low density regions accurately and is easy to compute. We find the number of cells of order $n^{1/3}$ minimizes the mean Hellinger distance between the density $f$ and a class of histograms whose cutpoints are chosen from the order statistics.
Publié le : 1992-03-14
Classification:
Density estimation,
Hellinger distance,
histogram,
order statistics,
spacing,
62G05,
62E20
@article{1176348523,
author = {Kanazawa, Yuichiro},
title = {An Optimal Variable Cell Histogram Based on the Sample Spacings},
journal = {Ann. Statist.},
volume = {20},
number = {1},
year = {1992},
pages = { 291-304},
language = {en},
url = {http://dml.mathdoc.fr/item/1176348523}
}
Kanazawa, Yuichiro. An Optimal Variable Cell Histogram Based on the Sample Spacings. Ann. Statist., Tome 20 (1992) no. 1, pp. 291-304. http://gdmltest.u-ga.fr/item/1176348523/