We consider reinforcement learning algorithms in normal
form games. Using two-timescales stochastic approximation, we
introduce a model-free algorithm which is asymptotically equivalent
to the smooth fictitious play algorithm, in that both result in
asymptotic pseudotrajectories to the flow defined by the smooth
best response dynamics. Both of these algorithms are shown to
converge almost surely to Nash distribution in two-player
zero-sum games and $N$-player partnership games. However, there are
simple games for which these, and most other adaptive processes,
fail to converge--in particular, we consider the $N$-player
matching pennies game and Shapley's variant of the
rock--scissors--paper game. By extending stochastic approximation
results to multiple timescales we can allow each player to learn at
a different rate. We show that this extension will converge for
two-player zero-sum games and two-player partnership games, as well
as for the two special cases we consider.