Optimal stationary policies inrisk-sensitive dynamic programs with finite state spaceand nonnegative rewards
Cavazos-Cadena, Rolando ; Montes-de-Oca, Raúl
Applicationes Mathematicae, Tome 27 (2000), p. 167-185 / Harvested from The Polish Digital Mathematics Library

This work concerns controlled Markov chains with finite state space and nonnegative rewards; it is assumed that the controller has a constant risk-sensitivity, and that the performance ofa control policy is measured by a risk-sensitive expected total-reward criterion. The existence of optimal stationary policies isstudied within this context, and the main resultestablishes the optimalityof a stationary policy achieving the supremum in the correspondingoptimality equation, whenever the associated Markov chain hasa unique positive recurrent class. Two explicit examples are providedto show that, if such an additional condition fails, an optimal stationarypolicy cannot be generally guaranteed. The results of this note, which consider both the risk-seeking and the risk-averse cases, answer an extended version of a question recently posed in Puterman (1994).

Publié le : 2000-01-01
EUDML-ID : urn:eudml:doc:219265
@article{bwmeta1.element.bwnjournal-article-zmv27i2p167bwm,
     author = {Rolando Cavazos-Cadena and Ra\'ul Montes-de-Oca},
     title = {Optimal stationary policies inrisk-sensitive dynamic programs with finite state spaceand nonnegative rewards},
     journal = {Applicationes Mathematicae},
     volume = {27},
     year = {2000},
     pages = {167-185},
     zbl = {1006.93070},
     language = {en},
     url = {http://dml.mathdoc.fr/item/bwmeta1.element.bwnjournal-article-zmv27i2p167bwm}
}
Cavazos-Cadena, Rolando; Montes-de-Oca, Raúl. Optimal stationary policies inrisk-sensitive dynamic programs with finite state spaceand nonnegative rewards. Applicationes Mathematicae, Tome 27 (2000) pp. 167-185. http://gdmltest.u-ga.fr/item/bwmeta1.element.bwnjournal-article-zmv27i2p167bwm/

[000] M. G. Ávila-Godoy (1998), Controlled Markov chains with exponentialrisk-sensitive criteria: modularity, structured policies and applications, Ph.D. Dissertation, Dept. of Math., Univ. ofArizona, Tucson, AZ.

[001] R. Cavazos-Cadena and E. Fernández-Gaucherand (1999), Controlled Markov chains with risk-sensitive criteria:average cost, optimality equations, and optimal solutions, Math. Methods Oper. Res. 43, 121-139. | Zbl 0953.93077

[002] R. Cavazos-Cadena and R. Montes-de-Oca (1999), Optimal stationarypolicies in controlled Markov chains with theexpected total-reward criterion, Research Report No. 1.01.010.99, Univ. Autónoma Metropolitana, Campus Iztapalapa, México, D.F. | Zbl 0937.90114

[003] P. C. Fishburn (1970), Utility Theory for Decision Making, Wiley, New York. | Zbl 0213.46202

[004] W. H. Fleming and D. Hernández-Hernández (1997), Risk-sensitive control of finite machines on an infinite horizon I, SIAM J. Control Optim. 35, 1790-1810. | Zbl 0891.93085

[005] O. Hernández-Lerma (1989), Adaptive Markov Control Processes, Springer, New York. | Zbl 0698.90053

[006] K. Hinderer (1970), Foundations of Non-Stationary Dynamic Programming with Discrete Time Parameter, Lecture Notes in Oper. Res. 33, Springer, New York. | Zbl 0202.18401

[007] R. A. Howard and J. E. Matheson (1972), Risk-sensitive Markov decisionprocesses, Management Sci. 18, 356-369. | Zbl 0238.90007

[008] M. Loève (1977), Probability Theory I, 4th ed., Springer, New York. | Zbl 0359.60001

[009] J. W. Pratt (1964), Risk aversion in the small and in the large, Econometrica 32, 122-136. | Zbl 0132.13906

[010] M. L. Puterman (1994), Markov Decision Processes, Wiley, New York. | Zbl 0829.90134

[011] S. M. Ross (1970), Applied Probability Models with Optimization Applications, Holden-Day, San Francisco. | Zbl 0213.19101

[012] R. Strauch (1966), Negative dynamic programming, Ann.Math. Statist. 37, 871-890. | Zbl 0144.43201