The Robbins-Isbell Two-Armed-Bandit Problem with Finite Memory

Smith, Carter Vincent; Pyke, Ronald

Ann. Math. Statist., Tome 36 (1965) no. 6, p. 1375-1386 / Harvested from Project Euclid

Résumé

This paper studies the sequential decision model known as the two-armed-bandit with finite memory. It was introduced by Robbins [8] in 1956 and studied further by Isbell [5] in 1959. In this paper, a set of rules is defined which are uniformly better than those given in [5] and [8]. A much larger class of rules is then defined, one member of which is conjectured to be a uniformly best rule.

Publié le : 1965-10-14
Classification:

@article{1177699897,
     author = {Smith, Carter Vincent and Pyke, Ronald},
     title = {The Robbins-Isbell Two-Armed-Bandit Problem with Finite Memory},
     journal = {Ann. Math. Statist.},
     volume = {36},
     number = {6},
     year = {1965},
     pages = { 1375-1386},
     language = {en},
     url = {http://dml.mathdoc.fr/item/1177699897}
}

Smith, Carter Vincent; Pyke, Ronald. The Robbins-Isbell Two-Armed-Bandit Problem with Finite Memory. Ann. Math. Statist., Tome 36 (1965) no. 6, pp.  1375-1386. http://gdmltest.u-ga.fr/item/1177699897/