Multi-Agent Reinforcement Learning for Efficient Content Caching in Mobile D2D Networks

Wei Jiang,Gang Feng,Shuang Qin,Tak Shing Peter Yum,Guohong Cao
DOI: https://doi.org/10.1109/twc.2019.2894403
IF: 10.4
2019-01-01
IEEE Transactions on Wireless Communications
Abstract:To address the increase of multimedia traffic dominated by streaming videos, user equipment (UE) can collaboratively cache and share contents to alleviate the burden of base stations. Prior work on device-to-device (D2D) caching policies assumes perfect knowledge of the content popularity distribution. Since the content popularity distribution is usually unavailable in advance, a machine learning-based caching strategy that exploits the knowledge of content demand history would be highly promising. Thus, we design D2D caching strategies using multi-agent reinforcement learning in this paper. Specifically, we model the D2D caching problem as a multi-agent multi-armed bandit problem and use Q-learning to learn how to coordinate the caching decisions. The UEs can be independent learners (ILs) if they learn the Q-values of their own actions, and joint action learners (JALs) if they learn the Q-values of their own actions in conjunction with those of the other UEs. As the action space is very vast leading to high computational complexity, a modified combinatorial upper confidence bound algorithm is proposed to reduce the action space for both IL and JAL. The simulation results show that the proposed JAL-based caching scheme outperforms the IL-based caching scheme and other popular caching schemes in terms of average downloading latency and cache hit rate.
What problem does this paper attempt to address?