Estimating entropy and entropy norm on data streams
Chakrabarti, Amit ; Do Ba, Khanh ; Muthukrishnan, S.
Internet Math., Tome 3 (2006) no. 1, p. 63-78 / Harvested from Project Euclid
We consider the problem of computing information-theoretic functions, such as entropy, on a data stream, using sublinear space. ¶ Our first result deals with a measure we call the entropy norm of an input stream: it is closely related to entropy but is structurally similar to the well-studied notion of frequency moments. We give a polylogarithmic-space, one-pass algorithm for estimating this norm under certain conditions on the input stream. We also prove a lower bound that rules out such an algorithm if these conditions do not hold. ¶ Our second group of results is for estimating the empirical entropy of an input stream. We first present a sublinear-space, one-pass algorithm for this problem. For a stream of $m$ items and a given real parameter $\alpha$, our algorithm uses space $\widetilde{O}(m^{2\alpha})$ and provides an approximation of $1/\alpha$ in the worst case and $(1+\eps)$ in "most'' cases. We then present a two-pass, polylogarithmic-space, $(1+\eps)$-approximation algorithm. All our algorithms are quite simple.
Publié le : 2006-05-14
Classification:  68P99,  94A15
@article{1175266368,
     author = {Chakrabarti, Amit and Do Ba, Khanh and Muthukrishnan, S.},
     title = {Estimating entropy and entropy norm on data streams},
     journal = {Internet Math.},
     volume = {3},
     number = {1},
     year = {2006},
     pages = { 63-78},
     language = {en},
     url = {http://dml.mathdoc.fr/item/1175266368}
}
Chakrabarti, Amit; Do Ba, Khanh; Muthukrishnan, S. Estimating entropy and entropy norm on data streams. Internet Math., Tome 3 (2006) no. 1, pp.  63-78. http://gdmltest.u-ga.fr/item/1175266368/