On the Gittins Index for Multiarmed Bandits

Weber, Richard

Weber, Richard

Ann. Appl. Probab., Tome 2 (1992) no. 4, p. 1024-1033 / Harvested from Project Euclid

Résumé

This paper considers the multiarmed bandit problem and presents a new proof of the optimality of the Gittins index policy. The proof is intuitive and does not require an interchange argument. The insight it affords is used to give a streamlined summary of previous research and to prove a new result: The optimal value function is a submodular set function of the available projects.

Publié le : 1992-11-14
Classification: Multiarmed bandit problem, stochastic scheduling, Markov decision processes, optimal stopping, sequential methods, 60G40, 90B35, 62L05, 90C40

@article{1177005588,
     author = {Weber, Richard},
     title = {On the Gittins Index for Multiarmed Bandits},
     journal = {Ann. Appl. Probab.},
     volume = {2},
     number = {4},
     year = {1992},
     pages = { 1024-1033},
     language = {en},
     url = {http://dml.mathdoc.fr/item/1177005588}
}

Weber, Richard. On the Gittins Index for Multiarmed Bandits. Ann. Appl. Probab., Tome 2 (1992) no. 4, pp.  1024-1033. http://gdmltest.u-ga.fr/item/1177005588/