In the study of stochastic dynamic team problems, analytical methods for
finding optimal policies are often inapplicable due to lack of prior knowledge
of the cost function or the state dynamics. Reinforcement learning offers a
possible solution to such coordination problems. Existing learning methods for
coordinating play either rely on control sharing among controllers or
otherwise, in general, do not guarantee convergence to optimal policies. In a
recent paper, we provided a decentralized algorithm for finding equilibrium
policies in weakly acyclic stochastic dynamic games, which contain team games
as an important special case. However, stochastic dynamic teams can in general
possess suboptimal equilibrium policies whose cost can be arbitrarily higher
than a team optimal policy's cost. In this paper, we present a reinforcement
learning algorithm and its refinements, and provide probabilistic guarantees
for convergence to globally optimal policies in team games as well as a more
general class of coordination games. The algorithms presented here are strictly
decentralized in that they require only access to local information such as
cost realizations, previous local actions, and state transitions.