Bayesian Nonparametric Bandits
Clayton, Murray K. ; Berry, Donald A.
Ann. Statist., Tome 13 (1985) no. 1, p. 1523-1534 / Harvested from Project Euclid
Sequential selections are to be made from two stochastic processes, or "arms." At each stage the arm selected for observation depends on past observations. The objective is to maximize the expected sum of the first $n$ observations. For arm 1 the observations are identically distributed with probability measure $P$, and for arm 2 the observations have probability measure $Q; P$ is a Dirichlet process and $Q$ is known. An equivalent problem is deciding sequentially when to stop sampling from an unknown population. Optimal strategies are shown to continue sampling if the current observation is sufficiently large. A simple form of such a rule is expressed in terms of a degenerate Dirichlet process which is related to $P$.
Publié le : 1985-12-14
Classification:  Sequential decisions,  nonparametric decisions,  optimal stopping,  one-armed bandits,  two-armed bandits,  Dirichlet bandits,  62L05,  62L15
@article{1176349753,
     author = {Clayton, Murray K. and Berry, Donald A.},
     title = {Bayesian Nonparametric Bandits},
     journal = {Ann. Statist.},
     volume = {13},
     number = {1},
     year = {1985},
     pages = { 1523-1534},
     language = {en},
     url = {http://dml.mathdoc.fr/item/1176349753}
}
Clayton, Murray K.; Berry, Donald A. Bayesian Nonparametric Bandits. Ann. Statist., Tome 13 (1985) no. 1, pp.  1523-1534. http://gdmltest.u-ga.fr/item/1176349753/