Averaging vs. Discounting in Dynamic Programming: a Counterexample
Flynn, James
Ann. Statist., Tome 2 (1974) no. 1, p. 411-413 / Harvested from Project Euclid
We consider countable state, finite action dynamic programming problems with bounded rewards. Under Blackwell's optimality criterion, a policy is optimal if it maximizes the expected discounted total return for all values of the discount factor sufficiently close to 1. We give an example where a policy meets that optimality criterion, but is not optimal with respect to Derman's average cost criterion. We also give conditions under which this pathology cannot occur.
Publié le : 1974-03-14
Classification:  Dynamic programming,  average cost criteria,  discounting,  Markov decision process,  49C15,  62L99,  90C40,  93C55,  60J10,  60J20
@article{1176342678,
     author = {Flynn, James},
     title = {Averaging vs. Discounting in Dynamic Programming: a Counterexample},
     journal = {Ann. Statist.},
     volume = {2},
     number = {1},
     year = {1974},
     pages = { 411-413},
     language = {en},
     url = {http://dml.mathdoc.fr/item/1176342678}
}
Flynn, James. Averaging vs. Discounting in Dynamic Programming: a Counterexample. Ann. Statist., Tome 2 (1974) no. 1, pp.  411-413. http://gdmltest.u-ga.fr/item/1176342678/