Motivated by an application in computerized adaptive tests, we
consider the following sequential design problem.There are $J$ jobs to be
processed according to a predetermined order. A single machine is available to
process these $J$ jobs. Each job under processing evolves stochastically as a
Markov chain and earns rewards as it is processed, not otherwise. The Markov
chain has transition probabilities parameterized by an unknown parameter
$\theta$. The objective is to determine how long each job should be processed
so that the total expected rewards over an extended time interval is maximized.
We define the regret associated with a strategy as the shortfall from the
maximum expected reward under complete information on $\theta$. Therefore the
problem is equivalent to minimizing the regret. The asymptotic lower bound for
the regret associated with any uniformly good strategy is characterized by a
deterministic constraint minimization problem. In ignorance of the parameter
value, we construct a class of efficient strategies, which achieve the lower
bound, based on the theory of sequential testing.
@article{1015957475,
author = {Fuh, Cheng-Der and Hu, Inchi},
title = {Asymptotically efficient strategies for a stochastic scheduling
problem with order constraints},
journal = {Ann. Statist.},
volume = {28},
number = {3},
year = {2000},
pages = { 1670-1695},
language = {en},
url = {http://dml.mathdoc.fr/item/1015957475}
}
Fuh, Cheng-Der; Hu, Inchi. Asymptotically efficient strategies for a stochastic scheduling
problem with order constraints. Ann. Statist., Tome 28 (2000) no. 3, pp. 1670-1695. http://gdmltest.u-ga.fr/item/1015957475/