A Note on the Bernoulli Two-Armed Bandit Problem
Kelley, Thomas A.
Ann. Statist., Tome 2 (1974) no. 1, p. 1056-1062 / Harvested from Project Euclid
Suppose the arms of a two-armed bandit generate i.i.d. Bernoulli random variables with success probabilities $\rho$ and $\lambda$ respectively. It is desired to maximize the expected sum of $N$ trials where $N$ is fixed. If the prior distribution of $(\rho, \lambda)$ is concentrated at two points $(a, b)$ and $(c, d)$ in the unit square, a characterization of the optimal policy is given. In terms of $a, b, c$, and $d$, necessary and sufficient conditions are given for the optimality of the myopic policy.
Publié le : 1974-09-14
Classification:  62.45,  62.35,  Bernoulli random variable,  myopic,  optimal,  posterior distribution,  relative advantage,  sequential,  strategy,  two-armed bandit problem,  two-point prior distribution
@article{1176342827,
     author = {Kelley, Thomas A.},
     title = {A Note on the Bernoulli Two-Armed Bandit Problem},
     journal = {Ann. Statist.},
     volume = {2},
     number = {1},
     year = {1974},
     pages = { 1056-1062},
     language = {en},
     url = {http://dml.mathdoc.fr/item/1176342827}
}
Kelley, Thomas A. A Note on the Bernoulli Two-Armed Bandit Problem. Ann. Statist., Tome 2 (1974) no. 1, pp.  1056-1062. http://gdmltest.u-ga.fr/item/1176342827/