Averaging vs. Discounting in Dynamic Programming: a Counterexample

Flynn, James

Flynn, James

Ann. Statist., Tome 2 (1974) no. 1, p. 411-413 / Harvested from Project Euclid

Résumé

We consider countable state, finite action dynamic programming problems with bounded rewards. Under Blackwell's optimality criterion, a policy is optimal if it maximizes the expected discounted total return for all values of the discount factor sufficiently close to 1. We give an example where a policy meets that optimality criterion, but is not optimal with respect to Derman's average cost criterion. We also give conditions under which this pathology cannot occur.

Publié le : 1974-03-14
Classification: Dynamic programming, average cost criteria, discounting, Markov decision process, 49C15, 62L99, 90C40, 93C55, 60J10, 60J20

@article{1176342678,
     author = {Flynn, James},
     title = {Averaging vs. Discounting in Dynamic Programming: a Counterexample},
     journal = {Ann. Statist.},
     volume = {2},
     number = {1},
     year = {1974},
     pages = { 411-413},
     language = {en},
     url = {http://dml.mathdoc.fr/item/1176342678}
}

Flynn, James. Averaging vs. Discounting in Dynamic Programming: a Counterexample. Ann. Statist., Tome 2 (1974) no. 1, pp.  411-413. http://gdmltest.u-ga.fr/item/1176342678/