Bandit problems with infinitely many arms

Berry, Donald A.; Chen, Robert W.; Zame, Alan; Heath, David C.; Shepp, Larry A.

Berry, Donald A. ; Chen, Robert W. ; Zame, Alan ; Heath, David C. ; Shepp, Larry A.

Ann. Statist., Tome 25 (1997) no. 6, p. 2103-2116 / Harvested from Project Euclid

Résumé

We consider a bandit problem consisting of a sequence of n choices from an infinite number of Bernoulli arms, with $n \to \infty$. The objective is to minimize the long-run failure rate. The Bernoulli parameters are independent observations from a distribution F. We first assume F to be the uniform distribution on (0, 1) and consider various extensions. In the uniform case we show that the best lower bound for the expected failure proportion is between $\sqrt{2}/\sqrt{n}$ and $2/\sqrt{n}$ and we exhibit classes of strategies that achieve the latter.

Publié le : 1997-10-14
Classification: Bandit problems, sequential experimentation, dynamic allocation of Bernoulli processes, staying with a winner, switching with a loser, 62C25, 62L05, 60F99

@article{1069362389,
     author = {Berry, Donald A. and Chen, Robert W. and Zame, Alan and Heath, David C. and Shepp, Larry A.},
     title = {Bandit problems with infinitely many arms},
     journal = {Ann. Statist.},
     volume = {25},
     number = {6},
     year = {1997},
     pages = { 2103-2116},
     language = {en},
     url = {http://dml.mathdoc.fr/item/1069362389}
}

Berry, Donald A.; Chen, Robert W.; Zame, Alan; Heath, David C.; Shepp, Larry A. Bandit problems with infinitely many arms. Ann. Statist., Tome 25 (1997) no. 6, pp.  2103-2116. http://gdmltest.u-ga.fr/item/1069362389/