We study the existence of sample path average cost (SPAC-) optimal policies for Markov control processes on Borel spaces with strictly unbounded costs, i.e., costs that grow without bound on the complement of compact subsets. Assuming only that the cost function is lower semicontinuous and that the transition law is weakly continuous, we show the existence of a relaxed policy with 'minimal' expected average cost and that the optimal average cost is the limit of discounted programs. Moreover, we show that if such a policy induces a positive Harris recurrent Markov chain, then it is also sample path average (SPAC-) optimal. We apply our results to inventory systems and, in a particular case, we compute explicitly a deterministic stationary SPAC-optimal policy.
@article{bwmeta1.element.bwnjournal-article-zmv26i4p363bwm, author = {Oscar Vega-Amaya}, title = {Sample path average optimality of Markov control processes with strictly unbounded cost}, journal = {Applicationes Mathematicae}, volume = {26}, year = {1999}, pages = {363-381}, zbl = {1050.93523}, language = {en}, url = {http://dml.mathdoc.fr/item/bwmeta1.element.bwnjournal-article-zmv26i4p363bwm} }
Vega-Amaya, Oscar. Sample path average optimality of Markov control processes with strictly unbounded cost. Applicationes Mathematicae, Tome 26 (1999) pp. 363-381. http://gdmltest.u-ga.fr/item/bwmeta1.element.bwnjournal-article-zmv26i4p363bwm/
[000] A. Arapostathis et al. (1993), Discrete time controlled Markov processes with an average cost criterion: A survey, SIAM J. Control Optim. 31, 282-344. | Zbl 0770.93064
[001] D. P. Bertsekas (1987), Dynamic Programming: Deterministic and Stochastic Models, Prentice-Hall, Englewood Cliffs, NJ. | Zbl 0649.93001
[002] D. P. Bertsekas and S. E. Shreve (1978), Stochastic Optimal Control: The Discrete Time Case, Academic Press, New York. | Zbl 0471.93002
[003] P. Billingsley (1968), Convergence of Probability Measures, Wiley. | Zbl 0172.21201
[004] V. S. Borkar (1991), Topics in Controlled Markov Chains, Pitman Res. Notes Math. Ser. 240, Longman Sci. Tech. | Zbl 0725.93082
[005] R. Cavazos-Cadena and E. Fernández-Gaucherand (1995), Denumerable controlled Markov chains with average reward criterion : sample path optimality, Z. Oper. Res. 41, 89-108. | Zbl 0835.90116
[006] R. M. Dudley (1989), Real Analysis and Probability, Wadsworth & Brooks. | Zbl 0686.60001
[007] P. Hall and C. C. Heyde (1980), Martingale Limit Theory and Its Application, Academic Press. | Zbl 0462.60045
[008] O. Hernández-Lerma (1993), Existence of average optimal policies in Markov control processes with strictly unbounded costs, Kybernetika 29, 1-17. | Zbl 0792.93120
[009] O. Hernández-Lerma and J. B. Lasserre (1995), Invariant probabilities for Feller-Markov chains, J. Appl. Math. Stochastic Anal. 8, 341-345. | Zbl 0870.60061
[010] O. Hernández-Lerma and J. B. Lasserre (1996), Discrete-Time Markov Control Processes: Basic Optimality Criteria, Springer, New York. | Zbl 0840.93001
[011] O. Hernández-Lerma and J. B. Lasserre (1997), Policy iteration for average cost Markov control processes on Borel spaces, Acta Appl. Math., to appear. | Zbl 0872.93080
[012] O. Hernández-Lerma and M. Muñoz-de-Osak (1992), Discrete-time Markov con- trol processes with discounted unbounded cost: optimality criteria Kybernetika 28, 191-212. | Zbl 0771.93054
[013] O. Hernández-Lerma, O. Vega-Amaya and G. Carrasco (1998), Sample-path optimality and variance-minimization of average cost Markov control processes, Reporte Interno #236, Departamento de Matemáticas, CINVESTAV-IPN, México City. | Zbl 0951.93074
[014] K. Hinderer (1970), Foundations of Non-Stationary Dynamic Programming with Discrete Time Parameters, Lecture Notes in Oper. Res. and Math. Systems 33, Springer, Berlin. | Zbl 0202.18401
[015] J. B. Lasserre (1997), Sample-path average optimality for Markov control processes, Report No. 97102, LAAS-CNRS, Toulouse. | Zbl 0956.93066
[016] H. L. Lee and S. Nahmias (1993), Single-product, single-location models, in: Logistic of Production and Inventory, S. C. Graves, A. H. G. Rinnooy Kan and P. H. Zipkin (eds.), Handbooks in Operations Research and Management Science, Vol. 4, North-Holland, 3-51.
[017] P. Mandl and M. Lausmanová (1991), Two extensions of asymptotic methods in controlled Markov chains, Ann. Oper. Res. 28, 67-80. | Zbl 0754.60081
[018] S. P. Meyn (1989), Ergodic theorems for discrete time stochastic systems using a stochastic Lyapunov function, SIAM J. Control Optim. 27, 1409-1439. | Zbl 0681.60067
[019] S. P. Meyn (1995), The policy iteration algorithm for average reward Markov decision processes with general state space, preprint, Coordinated Science Laboratory, University of Illinois, Urbana, IL.
[020] S. P. Meyn and R. L. Tweedie (1993), Markov Chains and Stochastic Stability, Springer, London. | Zbl 0925.60001
[021] M. Parlar and R. Rempała (1992), Stochastic inventory problem with piecewise quadratic holding cost function containing a cost-free interval, J. Optim. Theory Appl. 75, 133-153. | Zbl 0795.90014
[022] O. Vega-Amaya and R. Montes-de-Oca (1998), Application of average dynamic programming to inventory systems, Math. Methods Oper. Res. 47, 451-471. | Zbl 0940.90007