A control problem for a partially observable Markov chain depending on a parameter with long run average cost is studied. Using uniform ergodicity arguments it is shown that, for values of the parameter varying in a compact set, it is possible to consider only a finite number of nearly optimal controls based on the values of actually computable approximate filters. This leads to an algorithm that guarantees nearly selfoptimizing properties without identifiability conditions. The algorithm is based on probing control, whose cost is additionally assumed to be periodically observable.
@article{bwmeta1.element.bwnjournal-article-zmv22z2p165bwm, author = {Giovanni Di Masi and \L ukasz Stettner}, title = {On adaptive control of a partially observed Markov chain}, journal = {Applicationes Mathematicae}, volume = {22}, year = {1994}, pages = {165-180}, zbl = {0808.93070}, language = {en}, url = {http://dml.mathdoc.fr/item/bwmeta1.element.bwnjournal-article-zmv22z2p165bwm} }
Di Masi, Giovanni; Stettner, Łukasz. On adaptive control of a partially observed Markov chain. Applicationes Mathematicae, Tome 22 (1994) pp. 165-180. http://gdmltest.u-ga.fr/item/bwmeta1.element.bwnjournal-article-zmv22z2p165bwm/
[000] [1] A. Arapostathis and S. I. Marcus, Analysis of an identification algorithm arising in the adaptive estimation of Markov chains, Math. Control Signals Systems 3 (1990), 1-29. | Zbl 0685.93063
[001] [2] V. V. Baranov, A recursive algorithm in Markovian decision processes, Cybernetics 18 (1982), 499-506. | Zbl 0517.90089
[002] [3] D. P. Bertsekas, Dynamic Programming and Stochastic Control, Academic Press, New York, 1976.
[003] [4] J. L. Doob, Stochastic Processes, Wiley, New York, 1953. | Zbl 0053.26802
[004] [5] W. Feller, An Introduction to Probability Theory and Its Applications II, Wiley, New York, 1971. | Zbl 0219.60003
[005] [6] E. Fernández-Gaucherand, A. Arapostathis and S. I. Marcus, On the adaptive control of a partially observable Markov decision process, in: Proc. 27th IEEE Conf. on Decision and Control, 1988, 1204-1210.
[006] [7] E. Fernández-Gaucherand, A. Arapostathis and S. I. Marcus, On the adaptive control of a partially observable binary Markov decision process, in: Advances in Computing and Control, W. A. Porter, S. C. Kak and J. L. Aravena (eds.), Lecture Notes in Control and Inform. Sci. 130, Springer, New York, 1989, 217-228. | Zbl 0712.93063
[007] [8] L. G. Gubenko and E. S. Shtatland, On discrete-time Markov decision processes, Theory Probab. Math. Statist. 7 (1975), 47-61.
[008] [9] O. Hernández-Lerma, Adaptive Markov Control Processes, Springer, New York, 1989.
[009] [10] O. Hernández-Lerma and S. I. Marcus, Adaptive control of Markov processes with incomplete state information and unknown parameters, J. Optim. Theory Appl. 52 (1987), 227-241. | Zbl 0585.90090
[010] [11] O. Hernández-Lerma and S. I. Marcus, Nonparametric adaptive control of discrete-time partially observable stochastic systems, J. Math. Anal. Appl. 137 (1989), 312-334. | Zbl 0675.93055
[011] [12] A. H. Jazwinski, Stochastic Processes and Filtering Theory, Academic Press, New York, 1970. | Zbl 0203.50101
[012] [13] N. W. Kartashov, Criteria for uniform ergodicity and strong stability of Markov chains in general state space, Theory Probab. Math. Statist. 30 (1985), 71-89. | Zbl 0586.60058
[013] [14] P. R. Kumar and P. Varaiya, Stochastic Systems: Estimation, Identification and Adaptive Control, Prentice-Hall, Englewood Cliffs, 1986. | Zbl 0706.93057
[014] [15] H. J. Kushner and H. Huang, Approximation and limit results for nonlinear filters with wide bandwidth observation noise, Stochastics 16 (1986), 65-96. | Zbl 0595.60046
[015] [16] G. E. Monahan, A survey of partially observable Markov decision processes: theory, models and algorithms, Management Sci. 28 (1982), 1-16. | Zbl 0486.90084
[016] [17] W. J. Runggaldier and Ł. Stettner, Nearly optimal controls for stochastic ergodic problems with partial observation, SIAM J. Control Optim. 31 (1993), 180-218. | Zbl 0770.93092
[017] [18] Ł. Stettner, On nearly self-optimizing strategies for a discrete-time uniformly ergodic adaptive model, J. Appl. Math. Optim. 27 (1993), 161-177. | Zbl 0769.93084