We study a multi-armed bandit problem in a setting where covariates
are available. We take a nonparametric approach to estimate the functional
relationship between the response (reward) and the covariates. The estimated
relationships and appropriate randomization are used to select a good arm to
play for a greater expected reward. Randomization helps balance the tendency to
trust the currently most promising arm with further exploration of other arms.
It is shown that, with some familiar nonparametric methods (e.g., histogram),
the proposed strategy is strongly consistent in the sense that the accumulated
reward is asymptotically equivalent to that based on the best arm (which
depends on the covariates) almost surely.
@article{1015362186,
author = {Yang, Yuhong and Zhu, Dan},
title = {Randomized Allocation with nonparametric estimation for a
multi-armed bandit problem with covariates},
journal = {Ann. Statist.},
volume = {30},
number = {1},
year = {2002},
pages = { 100-121},
language = {en},
url = {http://dml.mathdoc.fr/item/1015362186}
}
Yang, Yuhong; Zhu, Dan. Randomized Allocation with nonparametric estimation for a
multi-armed bandit problem with covariates. Ann. Statist., Tome 30 (2002) no. 1, pp. 100-121. http://gdmltest.u-ga.fr/item/1015362186/