Discrete Dynamic Programming
Blackwell, David
Ann. Math. Statist., Tome 33 (1962) no. 4, p. 719-726 / Harvested from Project Euclid
We consider a system with a finite number $S$ of states $s$, labeled by the integers $1, 2, \cdots, S$. Periodically, say once a day, we observe the current state of the system, and then choose an action $a$ from a finite set $A$ of possible actions. As a joint result of the current state $s$ and the chosen action $a$, two things happen: (1) we receive an immediate income $i(s, a)$ and (2) the system moves to a new state $s'$ with the probability of a particular new state $s'$ given by a function $q = q(s' \mid s, a)$. Finally there is specified a discount factor $\beta, 0 \leqq \beta < 1$, so that the value of unit income $n$ days in the future is $\beta^n$. Our problem is to choose a policy which maximizes our total expected income. This problem, which is an interesting special case of the general dynamic programming problem, has been solved by Howard in his excellent book [3]. The case $\beta = 1$, also studied by Howard, is substantially more difficult. We shall obtain in this case results slightly beyond those of Howard, though still not complete. Our method, which treats $\beta = 1$ as a limiting case of $\beta < 1$, seems rather simpler than Howard's.
Publié le : 1962-06-14
Classification: 
@article{1177704593,
     author = {Blackwell, David},
     title = {Discrete Dynamic Programming},
     journal = {Ann. Math. Statist.},
     volume = {33},
     number = {4},
     year = {1962},
     pages = { 719-726},
     language = {en},
     url = {http://dml.mathdoc.fr/item/1177704593}
}
Blackwell, David. Discrete Dynamic Programming. Ann. Math. Statist., Tome 33 (1962) no. 4, pp.  719-726. http://gdmltest.u-ga.fr/item/1177704593/