Caching is envisioned to play a critical role in next-generation content
delivery infrastructure, cellular networks, and Internet architectures. By
smartly storing the most popular contents at the storage-enabled network
entities during off-peak demand instances, caching can benefit both network
infrastructure as well as end users, during on-peak periods. In this context,
distributing the limited storage capacity across network entities calls for
decentralized caching schemes. Many practical caching systems involve a parent
caching node connected to multiple leaf nodes to serve user file requests. To
model the two-way interactive influence between caching decisions at the parent
and leaf nodes, a reinforcement learning framework is put forth. To handle the
large continuous state space, a scalable deep reinforcement learning approach
is pursued. The novel approach relies on a deep Q-network to learn the
Q-function, and thus the optimal caching policy, in an online fashion.
Reinforcing the parent node with ability to learn-and-adapt to unknown policies
of leaf nodes as well as spatio-temporal dynamic evolution of file requests,
results in remarkable caching performance, as corroborated through numerical
tests.