Indices for Families of Competing Markov Decision Processes with Influence
Glazebrook, K. D.
Ann. Appl. Probab., Tome 3 (1993) no. 4, p. 1013-1032 / Harvested from Project Euclid
Nash obtained an important extension to the classical theory of Gittins indexation when he demonstrated that index policies were optimal for a class of multiarmed bandit problems with a multiplicatively separable reward structure. We characterise the relevant indices (herein referred to as Nash indices) as equivalent retirement rewards/penalties for appropriately defined maximisation/minimisation problems. We also give a condition which is sufficient to guarantee the optimality of index policies for a Nash-type model in which each constituent bandit has its own decision structure.
Publié le : 1993-11-14
Classification:  Gittins index,  Markov decision process,  optimal policy,  stopping time,  90C40
@article{1177005270,
     author = {Glazebrook, K. D.},
     title = {Indices for Families of Competing Markov Decision Processes with Influence},
     journal = {Ann. Appl. Probab.},
     volume = {3},
     number = {4},
     year = {1993},
     pages = { 1013-1032},
     language = {en},
     url = {http://dml.mathdoc.fr/item/1177005270}
}
Glazebrook, K. D. Indices for Families of Competing Markov Decision Processes with Influence. Ann. Appl. Probab., Tome 3 (1993) no. 4, pp.  1013-1032. http://gdmltest.u-ga.fr/item/1177005270/