Reinforcement Learning-Based Resource Allocation for Multiple Vehicles with Communication-Assisted Sensing Mechanism

Yuxin Fan,Zesong Fei,Jingxuan Huang,Xinyi Wang
DOI: https://doi.org/10.3390/electronics13132442
IF: 2.9
2024-06-22
Electronics
Abstract:Autonomous vehicles (AVs) can be equipped with Integrated sensing and communications (ISAC) devices to realize sensing and communication functions simultaneously. Time-division ISAC (TD-ISAC) is advantageous due to its ease of implementation, efficient deployment and integration into any system. TD-ISAC greatly enhances spectrum efficiency and equipment utilization and reduces system energy consumption. In this paper, we propose a communication-assisted sensing mechanism based on TD-ISAC to support multi-vehicle collaborative sensing. However, there are some challenges in applying TD-ISAC to AVs. First, AVs should allocate resources for sensing and communication in a dynamically changing environment. Second, the limited spectrum resources bring the problem of mutual interference of multi-vehicle signals. To address these issues, we construct a multi-vehicle signal interference model, formulate an optimization problem based on the partially observable Markov decision process (POMDP) framework and design a decentralized dynamic allocation scheme for multi-vehicle time–frequency resources based on a deep reinforcement learning (DRL) algorithm. Simulation results show that the proposed scheme performs better in miss detection probability and average system interference power compared to the DRQN algorithm without the communication-assisted sensing mechanism and the random algorithm without reinforcement learning. We can conclude that the proposed scheme can effectively allocate the resources of the TD-ISAC system and reduce interference between multiple vehicles.
engineering, electrical & electronic,computer science, information systems,physics, applied
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to efficiently allocate time - frequency resources in a dynamically changing environment in the multi - vehicle collaborative perception scenario, so as to reduce signal interference among multiple vehicles and improve the overall performance of the system. Specifically, the paper focuses on how to utilize the time - division - based integrated sensing and communication (TD - ISAC) technology to reduce the interference problem in the system by optimizing the resource allocation strategy while ensuring the vehicle sensing and communication functions. ### Main problem points: 1. **Resource allocation in a dynamic environment**: Autonomous vehicles (AVs) need to dynamically allocate resources for sensing and communication in a constantly changing environment. 2. **Multi - vehicle signal interference**: As the number of vehicles equipped with TD - ISAC systems increases, how to coordinate the resource use of these vehicles to minimize mutual interference has become an important challenge. ### Solutions: To address the above challenges, the paper proposes the following solutions: - **Communication - assisted sensing mechanism**: Through the communication - assisted sensing mechanism, the number of active radars in the system can be effectively reduced, and at the same time, the spectrum utilization rate can be improved through time - division, reducing the probability of multi - vehicle signals colliding in the same sub - band. - **Construction of multi - vehicle signal interference model**: A multi - vehicle sensing and communication interference model is constructed to comprehensively understand the characteristics and sources of interference, so as to take appropriate measures to manage system interference. - **Modeling of optimization problems based on the partially observable Markov decision process (POMDP) framework**: Using the POMDP framework, vehicles can adaptively select sensing or communication operations according to the dynamic environment and select different sub - bands to reduce multi - vehicle interference. - **Algorithm design based on deep reinforcement learning (DRL)**: A DRL algorithm using the target network and the prioritized experience replay (PER) scheme is designed, enabling multiple vehicles to better obtain the optimal strategy for time - frequency resource allocation under uncertain environmental factors. ### Conclusion: The scheme proposed in the paper performs better than the DRQN algorithm without the communication - assisted sensing mechanism and the random algorithm without reinforcement learning in terms of reducing the probability of missed detection and the average system interference power. This indicates that the proposed scheme can effectively allocate the resources of the TD - ISAC system and reduce the interference among multiple vehicles.