Index-based policies for discounted multi-armed bandits on parallel machines
Glazebrook, K. D. ; Wilkinson, D. J.
Ann. Appl. Probab., Tome 10 (2000) no. 2, p. 877-896 / Harvested from Project Euclid
We utilize and develop elements of the recent achievable region account of Gittins indexation by Bertsimas and Niño-Mora to design index-based policies for discounted multi-armed bandits on parallel machines. The policies analyzed have expected rewards which come within an $O(\alpha)$ quantity of optimality, where $\alpha > 0$ is a discount rate. In the main, the policies make an initial once for all allocation of bandits to machines, with each machine then handling its own workload optimally. This allocation must take careful account of the index structure of the bandits. The corresponding limit policies are average-overtaking optimal.
Publié le : 2000-08-14
Classification:  Average-overtaking optimal,  average-reward optimal,  Gittins index,  multi-armed bandit problem,  parallel machines,  suboptimality bound,  90B36,  90C40
@article{1019487512,
     author = {Glazebrook, K. D. and Wilkinson, D. J.},
     title = {Index-based policies for discounted multi-armed bandits on
		 parallel machines},
     journal = {Ann. Appl. Probab.},
     volume = {10},
     number = {2},
     year = {2000},
     pages = { 877-896},
     language = {en},
     url = {http://dml.mathdoc.fr/item/1019487512}
}
Glazebrook, K. D.; Wilkinson, D. J. Index-based policies for discounted multi-armed bandits on
		 parallel machines. Ann. Appl. Probab., Tome 10 (2000) no. 2, pp.  877-896. http://gdmltest.u-ga.fr/item/1019487512/