A Bernoulli Two-armed Bandit
Berry, Donald A.
Ann. Math. Statist., Tome 43 (1972) no. 6, p. 871-897 / Harvested from Project Euclid
One of two independent Bernoulli processes (arms) with unknown expectations $\rho$ and $\lambda$ is selected and observed at each of $n$ stages. The selection problem is sequential in that the process which is selected at a particular stage is a function of the results of previous selections as well as of prior information about $\rho$ and $\lambda$. The variables $\rho$ and $\lambda$ are assumed to be independent under the (prior) probability distribution. The objective is to maximize the expected number of successes from the $n$ selections. Sufficient conditions for the optimality of selecting one or the other of the arms are given and illustrated for example distributions. The stay-on-a-winner rule is proved.
Publié le : 1972-06-14
Classification: 
@article{1177692553,
     author = {Berry, Donald A.},
     title = {A Bernoulli Two-armed Bandit},
     journal = {Ann. Math. Statist.},
     volume = {43},
     number = {6},
     year = {1972},
     pages = { 871-897},
     language = {en},
     url = {http://dml.mathdoc.fr/item/1177692553}
}
Berry, Donald A. A Bernoulli Two-armed Bandit. Ann. Math. Statist., Tome 43 (1972) no. 6, pp.  871-897. http://gdmltest.u-ga.fr/item/1177692553/