Federated Offline Reinforcement Learning With Multimodal Data

Jiabao Wen,Huiao Dai,Jingyi He,Meng Xi,Shuai Xiao,Jiachen Yang
DOI: https://doi.org/10.1109/tce.2023.3330943
2023-01-01
IEEE Transactions on Consumer Electronics
Abstract:The Tactile Internet (TI) allows operators to have an immersive experience in a remote environment. During this process, users generate a large amount of demonstration data containing tactile information. It is important to reasonably use user-generated data to improve the intelligence of Tactile Internet applications without infringing on user privacy. In order to use only user-generated datasets for learning without expensive environment interaction, conservative policy estimation in offline reinforcement learning is introduced in this paper to ensure the convergence of reinforcement learning algorithms. In addition, the dataset composed of different user behavior data has the characteristics of multimodal distribution, where the same state corresponds to different actions. The offline reinforcement learning algorithm is used to reconstruct and learn the user’s behavior under the framework of federated learning, and the diffusion model is introduced to model the multimodal distribution caused by different user preference. Based on this, we propose a federated diffusion Q-learning (FDQL) algorithm and verify the effectiveness of the algorithm in the d4rl dataset. Experimental results demonstrate that the FDQL algorithm performs efficiently within the federated learning framework, effectively capturing users’ multimodal behaviors and achieving state-of-the-art results.
telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?