Domain Knowledge-Assisted Deep Reinforcement Learning Power Allocation for MIMO Radar Detection

Yuedong Wang,Yan Liang,Huixia Zhang,Yijing Gu
DOI: https://doi.org/10.1109/jsen.2022.3211606
IF: 4.3
2022-01-01
IEEE Sensors Journal
Abstract:The power allocation of multiple-input multiple-output (MIMO) radars is a key point in target tracking and detection. The optimality of a multitarget and multiconstraint optimization problem strictly depends on a priori model, which is difficult to obtain in time-varying, complex, and noncooperative environments. Recently, deep reinforcement learning (DRL) has been applied for target tracking tasks, which provides a trial-and-error interactive learning mechanism to improve the policy. Unlike tracking tasks with complete target state transition models, it remains an open issue for DRL-based MIMO radar detection that requires efficiently adapting the control policy to the environment of randomly appearing targets and extensive power transmission actions, which leads to sparse final task rewards and hence slow policy learning for the agent. Through introducing both the analytic model (radar equation) and empirical rules (expert preferences) for domain knowledge, this article proposes a domain-knowledge-assisted DRL (DKADRL) framework in which a domain-knowledge-based timely reward generator is utilized to generate timely rewards that assist the agent’s policy learning. To adjust the role of the timely rewards and the final task rewards, a reward fusion module is designed, which gradually increases the role of the final task rewards as the training process progresses, thus allows agent’s policy to converge to the final optimization goal. The algorithm is validated under two target motion scenarios, showing the higher target detection probability and the faster training speed, compared to equal power allocation and proximal policy optimization (PPO)-based power allocation.
What problem does this paper attempt to address?