Suppose the arms of a two-armed bandit generate i.i.d. Bernoulli random variables with success probabilities $\rho$ and $\lambda$ respectively. It is desired to maximize the expected sum of $N$ trials where $N$ is fixed. If the prior distribution of $(\rho, \lambda)$ is concentrated at two points $(a, b)$ and $(c, d)$ in the unit square, a characterization of the optimal policy is given. In terms of $a, b, c$, and $d$, necessary and sufficient conditions are given for the optimality of the myopic policy.
Publié le : 1974-09-14
Classification:
62.45,
62.35,
Bernoulli random variable,
myopic,
optimal,
posterior distribution,
relative advantage,
sequential,
strategy,
two-armed bandit problem,
two-point prior distribution
@article{1176342827,
author = {Kelley, Thomas A.},
title = {A Note on the Bernoulli Two-Armed Bandit Problem},
journal = {Ann. Statist.},
volume = {2},
number = {1},
year = {1974},
pages = { 1056-1062},
language = {en},
url = {http://dml.mathdoc.fr/item/1176342827}
}
Kelley, Thomas A. A Note on the Bernoulli Two-Armed Bandit Problem. Ann. Statist., Tome 2 (1974) no. 1, pp. 1056-1062. http://gdmltest.u-ga.fr/item/1176342827/