We consider a bandit problem consisting of a sequence of n choices from an infinite number of Bernoulli arms, with $n \to \infty$. The objective is to minimize the long-run failure rate. The Bernoulli parameters are independent observations from a distribution F. We first assume F to be the uniform distribution on (0, 1) and consider various extensions. In the uniform case we show that the best lower bound for the expected failure
proportion is between $\sqrt{2}/\sqrt{n}$ and $2/\sqrt{n}$ and we exhibit classes of strategies that achieve the latter.
Publié le : 1997-10-14
Classification:
Bandit problems,
sequential experimentation,
dynamic allocation of Bernoulli processes,
staying with a winner,
switching with a loser,
62C25,
62L05,
60F99
@article{1069362389,
author = {Berry, Donald A. and Chen, Robert W. and Zame, Alan and Heath, David C. and Shepp, Larry A.},
title = {Bandit problems with infinitely many arms},
journal = {Ann. Statist.},
volume = {25},
number = {6},
year = {1997},
pages = { 2103-2116},
language = {en},
url = {http://dml.mathdoc.fr/item/1069362389}
}
Berry, Donald A.; Chen, Robert W.; Zame, Alan; Heath, David C.; Shepp, Larry A. Bandit problems with infinitely many arms. Ann. Statist., Tome 25 (1997) no. 6, pp. 2103-2116. http://gdmltest.u-ga.fr/item/1069362389/