On the Gittins Index for Multiarmed Bandits
Weber, Richard
Ann. Appl. Probab., Tome 2 (1992) no. 4, p. 1024-1033 / Harvested from Project Euclid
This paper considers the multiarmed bandit problem and presents a new proof of the optimality of the Gittins index policy. The proof is intuitive and does not require an interchange argument. The insight it affords is used to give a streamlined summary of previous research and to prove a new result: The optimal value function is a submodular set function of the available projects.
Publié le : 1992-11-14
Classification:  Multiarmed bandit problem,  stochastic scheduling,  Markov decision processes,  optimal stopping,  sequential methods,  60G40,  90B35,  62L05,  90C40
@article{1177005588,
     author = {Weber, Richard},
     title = {On the Gittins Index for Multiarmed Bandits},
     journal = {Ann. Appl. Probab.},
     volume = {2},
     number = {4},
     year = {1992},
     pages = { 1024-1033},
     language = {en},
     url = {http://dml.mathdoc.fr/item/1177005588}
}
Weber, Richard. On the Gittins Index for Multiarmed Bandits. Ann. Appl. Probab., Tome 2 (1992) no. 4, pp.  1024-1033. http://gdmltest.u-ga.fr/item/1177005588/