Nash obtained an important extension to the classical theory of Gittins indexation when he demonstrated that index policies were optimal for a class of multiarmed bandit problems with a multiplicatively separable reward structure. We characterise the relevant indices (herein referred to as Nash indices) as equivalent retirement rewards/penalties for appropriately defined maximisation/minimisation problems. We also give a condition which is sufficient to guarantee the optimality of index policies for a Nash-type model in which each constituent bandit has its own decision structure.
Publié le : 1993-11-14
Classification:
Gittins index,
Markov decision process,
optimal policy,
stopping time,
90C40
@article{1177005270,
author = {Glazebrook, K. D.},
title = {Indices for Families of Competing Markov Decision Processes with Influence},
journal = {Ann. Appl. Probab.},
volume = {3},
number = {4},
year = {1993},
pages = { 1013-1032},
language = {en},
url = {http://dml.mathdoc.fr/item/1177005270}
}
Glazebrook, K. D. Indices for Families of Competing Markov Decision Processes with Influence. Ann. Appl. Probab., Tome 3 (1993) no. 4, pp. 1013-1032. http://gdmltest.u-ga.fr/item/1177005270/