On Maximal Rewards and $|varepsilon$-Optimal Policies in Continuous Time Markov Decision Chains
Lembersky, Mark R.
Ann. Statist., Tome 2 (1974) no. 1, p. 159-169 / Harvested from Project Euclid
For continuous time Markov decision chains of finite duration, we show that the vector of maximal total rewards, less a linear average-return term, converges as the duration $t \rightarrow \infty$. We then show that there are policies which are both simultaneously $\varepsilon$-optimal for all durations $t$ and are stationary except possibly for a final, finite segment. Further, the length of this final segment depends on $\varepsilon$, but not on $t$ for large enough $t$, while the initial stationary part of the policy is independent of both $\varepsilon$ and $t$.
Publié le : 1974-01-14
Classification:  Markov decision chains,  maximal rewards,  $\varepsilon$-optimal policies,  initially stationary policies,  dynamic programming,  90C40,  90B99,  93E20
@article{1176342621,
     author = {Lembersky, Mark R.},
     title = {On Maximal Rewards and $|varepsilon$-Optimal Policies in Continuous Time Markov Decision Chains},
     journal = {Ann. Statist.},
     volume = {2},
     number = {1},
     year = {1974},
     pages = { 159-169},
     language = {en},
     url = {http://dml.mathdoc.fr/item/1176342621}
}
Lembersky, Mark R. On Maximal Rewards and $|varepsilon$-Optimal Policies in Continuous Time Markov Decision Chains. Ann. Statist., Tome 2 (1974) no. 1, pp.  159-169. http://gdmltest.u-ga.fr/item/1176342621/