This paper considers the multiarmed bandit problem and presents a new proof of the optimality of the Gittins index policy. The proof is intuitive and does not require an interchange argument. The insight it affords is used to give a streamlined summary of previous research and to prove a new result: The optimal value function is a submodular set function of the available projects.
@article{1177005588,
author = {Weber, Richard},
title = {On the Gittins Index for Multiarmed Bandits},
journal = {Ann. Appl. Probab.},
volume = {2},
number = {4},
year = {1992},
pages = { 1024-1033},
language = {en},
url = {http://dml.mathdoc.fr/item/1177005588}
}
Weber, Richard. On the Gittins Index for Multiarmed Bandits. Ann. Appl. Probab., Tome 2 (1992) no. 4, pp. 1024-1033. http://gdmltest.u-ga.fr/item/1177005588/