Bayesian parameter estimation and adaptive control of Markov processes with time-averaged cost
Borkar, V. ; Associate, S.
Applicationes Mathematicae, Tome 25 (1998), p. 339-358 / Harvested from The Polish Digital Mathematics Library

This paper considers Bayesian parameter estimation and an associated adaptive control scheme for controlled Markov chains and diffusions with time-averaged cost. Asymptotic behaviour of the posterior law of the parameter given the observed trajectory is analyzed. This analysis suggests a "cost-biased" estimation scheme and associated self-tuning adaptive control. This is shown to be asymptotically optimal in the almost sure sense.

Publié le : 1998-01-01
EUDML-ID : urn:eudml:doc:219208
@article{bwmeta1.element.bwnjournal-article-zmv25i3p339bwm,
     author = {V. Borkar and S. Associate},
     title = {Bayesian parameter estimation and adaptive control of Markov processes with time-averaged cost},
     journal = {Applicationes Mathematicae},
     volume = {25},
     year = {1998},
     pages = {339-358},
     zbl = {0992.93086},
     language = {en},
     url = {http://dml.mathdoc.fr/item/bwmeta1.element.bwnjournal-article-zmv25i3p339bwm}
}
Borkar, V.; Associate, S. Bayesian parameter estimation and adaptive control of Markov processes with time-averaged cost. Applicationes Mathematicae, Tome 25 (1998) pp. 339-358. http://gdmltest.u-ga.fr/item/bwmeta1.element.bwnjournal-article-zmv25i3p339bwm/

[000] [1] R. Agrawal, D. Teneketzis and V. Anantharam, Asymptotically efficient adaptive allocation schemes for controlled Markov chains: finite parameter space, IEEE Trans. Automatic Control AC-34 (1989), 1249-1259. | Zbl 0689.93039

[001] [2] A. Barron, Are Bayes rules consistent in information?, in: Problems in Communication and Computation, T. M. Cover and B. Gopinath (eds.), Springer, New York, 1987, 85-91.

[002] [3] R. N. Bhattacharya, Asymptotic behaviour of several dimensional diffusions, in: Stochastic Nonlinear Systems, L. Arnold and R. Lefever (eds.), Springer, New York, 1981, 86-91.

[003] [4] D. Blackwell and L. Dubins, Merging of opinions with increasing information, Ann. Math. Statist. 33 (1962), 882-887. | Zbl 0109.35704

[004] [5] V. S. Borkar, Control of Markov chains with long run average cost criterion, in: Stochastic Differential Systems, Stochastic Control Theory and Applications, W. H. Fleming and P. L. Lions (eds.), Springer, New York, 1987, 57-77.

[005] [6] V. S. Borkar, The Kumar-Becker-Lin scheme revisited, J. Optim. Theory Appl. 66 (1990), 289-309. | Zbl 0682.93060

[006] [7] V. S. Borkar, Self-tuning control of diffusions without the identifiability condition, ibid. 68 (1991), 117-137. | Zbl 0697.93036

[007] [8] V. S. Borkar, On the Milito-Cruz adaptive control scheme for Markov chains, ibid. 77 (1993), 387-397. | Zbl 0791.93055

[008] [9] V. S. Borkar, A modified self-tuner for controlled diffusions with an unknown parameter, in: Mathematical Theory of Control (Bombay, 1990), A. V. Balakrishnan and M. C. Joshi (eds.), Marcel Dekker, 1992, 57-67. | Zbl 0790.93082

[009] [10] V. S. Borkar and M. K. Ghosh, Ergodic and adaptive control of nearest neighbour motions, Math. Control Signals and Systems 4 (1991), 81-98. | Zbl 0736.93078

[010] [11] V. S. Borkar and M. K. Ghosh, Ergodic control of multidimensional diffusions II: adaptive control, Appl. Math. Optim. 21 (1990), 191-220. | Zbl 0691.93027

[011] [12] V. S. Borkar and P. P. Varaiya, Identification and adaptive control of Markov chains I: finite parameter case, IEEE Trans. Automatic Control 24 (1979), 953-957. | Zbl 0416.93065

[012] [13] V. S. Borkar and P. P. Varaiya, Identification and adaptive control of Markov chains, SIAM J. Control Optim. 20 (1982), 470-488. | Zbl 0491.93063

[013] [14] E. K. P. Chong and P. J. Ramadge, Stochastic optimization of regenerative systems using infinitesimal perturbation analysis, IEEE Trans. Automatic Control 39 (1994), 1400-1410. | Zbl 0806.93058

[014] [15] Y. S. Chow and H. Teicher, Probability Theory: Independence, Interchangeability, Martingales, Springer, New York, 1979.

[015] [16] G. B. Di Masi and Ł. Stettner, Bayesian ergodic adaptive control of discrete time Markov processes, Stochastics Stochastic Reports 54 (1995), 301-316. | Zbl 0855.93103

[016] [17] B. Doshi and S. E. Shreve, Randomized self-tuning control of Markov chains, J. Appl. Probab. 17 (1980), 726-734. | Zbl 0442.93054

[017] [18] B. Hajek, Hitting-time and occupation-time bounds implied by drift analysis with applications, Adv. Appl. Probab. 14 (1982), 502-525. | Zbl 0495.60094

[018] [19] P. R. Kumar and A. Becker, A new family of optimal adaptive controllers for Markov chains, IEEE Trans. Automatic Control 27 (1982), 137-142. | Zbl 0471.93069

[019] [20] P. R. Kumar and W. Lin, Optimal adaptive controllers for Markov chains, ibid. 27 (1982), 756-774. | Zbl 0488.93036

[020] [21] P. R. Kumar and P. P. Varaiya, Stochastic Systems--Estimation, Identification and Adaptive Control, Prentice-Hall, 1986. | Zbl 0706.93057

[021] [22] P. Mandl, Estimation and control in Markov chains, Adv. Appl. Probab. 6 (1974), 40-60. | Zbl 0281.60070

[022] [23] R. Milito and J. B. Cruz, Jr., An optimization oriented approach to adaptive control of Markov chains, IEEE Trans. Automatic Control 32 (1987), 754-762. | Zbl 0632.93080

[023] [24] J. N. Tsitsiklis, Asynchronous stochastic approaximation and Q-learning, Machine Learning 16 (1994), 195-202. | Zbl 0820.68105

[024] [25] K. Van Hee, Bayesian Control of Markov Chains, Math. Center Tracts, 95, Math. Center, Amsterdam, 1978.