We consider countable state, finite action dynamic programming problems with bounded rewards. Under Blackwell's optimality criterion, a policy is optimal if it maximizes the expected discounted total return for all values of the discount factor sufficiently close to 1. We give an example where a policy meets that optimality criterion, but is not optimal with respect to Derman's average cost criterion. We also give conditions under which this pathology cannot occur.
Publié le : 1974-03-14
Classification:
Dynamic programming,
average cost criteria,
discounting,
Markov decision process,
49C15,
62L99,
90C40,
93C55,
60J10,
60J20
@article{1176342678,
author = {Flynn, James},
title = {Averaging vs. Discounting in Dynamic Programming: a Counterexample},
journal = {Ann. Statist.},
volume = {2},
number = {1},
year = {1974},
pages = { 411-413},
language = {en},
url = {http://dml.mathdoc.fr/item/1176342678}
}
Flynn, James. Averaging vs. Discounting in Dynamic Programming: a Counterexample. Ann. Statist., Tome 2 (1974) no. 1, pp. 411-413. http://gdmltest.u-ga.fr/item/1176342678/