Arm-Acquiring Bandits
Whittle, P.
Ann. Probab., Tome 9 (1981) no. 6, p. 284-292 / Harvested from Project Euclid
We consider the problem of allocating effort between projects at different stages of development when new projects are also continually appearing. An expression (14) is derived for the expected reward yielded by the Gittins index policy. This is shown to satisfy the dynamic programming equation for the problem, so confirming optimality of the policy.
Publié le : 1981-04-14
Classification:  Multiarmed bandit,  dynamic programming,  allocation index,  42C99,  62C99
@article{1176994469,
     author = {Whittle, P.},
     title = {Arm-Acquiring Bandits},
     journal = {Ann. Probab.},
     volume = {9},
     number = {6},
     year = {1981},
     pages = { 284-292},
     language = {en},
     url = {http://dml.mathdoc.fr/item/1176994469}
}
Whittle, P. Arm-Acquiring Bandits. Ann. Probab., Tome 9 (1981) no. 6, pp.  284-292. http://gdmltest.u-ga.fr/item/1176994469/