Adaptive Treatment Allocation and the Multi-Armed Bandit Problem

Lai, Tze Leung

Lai, Tze Leung

Ann. Statist., Tome 15 (1987) no. 1, p. 1091-1114 / Harvested from Project Euclid

Résumé

A class of simple adaptive allocation rules is proposed for the problem (often called the "multi-armed bandit problem") of sampling $x_1, \cdots x_N$ sequentially from $k$ populations with densities belonging to an exponential family, in order to maximize the expected value of the sum $S_N = x_1 + \cdots + x_N$. These allocation rules are based on certain upper confidence bounds, which are developed from boundary crossing theory, for the $k$ population parameters. The rules are shown to be asymptotically optimal as $N \rightarrow \infty$ from both Bayesian and frequentist points of view. Monte Carlo studies show that they also perform very well for moderate values of the horizon $N$.

Publié le : 1987-09-14
Classification: Sequential experimentation, adaptive control, dynamic allocation, boundary crossings, upper confidence bounds, 62L05, 60G40, 62L12

@article{1176350495,
     author = {Lai, Tze Leung},
     title = {Adaptive Treatment Allocation and the Multi-Armed Bandit Problem},
     journal = {Ann. Statist.},
     volume = {15},
     number = {1},
     year = {1987},
     pages = { 1091-1114},
     language = {en},
     url = {http://dml.mathdoc.fr/item/1176350495}
}

Lai, Tze Leung. Adaptive Treatment Allocation and the Multi-Armed Bandit Problem. Ann. Statist., Tome 15 (1987) no. 1, pp.  1091-1114. http://gdmltest.u-ga.fr/item/1176350495/