Scalable Parallel Task Scheduling for Autonomous Driving Using Multi-Task Deep Reinforcement Learning

Qi,Lingxin Zhang,Jingyu Wang,Haifeng Sun,Zirui Zhuang,Jianxin Liao,F. Richard Yu
DOI: https://doi.org/10.1109/tvt.2020.3029864
IF: 6.8
2020-01-01
IEEE Transactions on Vehicular Technology
Abstract:The Internet of Vehicles (IoV) as a promising application of Internet of Things (IoT) has played a significant role in autonomous driving, by connecting intelligent vehicles. Autonomous driving needs to process the mass environmental sensing data in coordination with surrounding vehicles, and makes an accurate driving judgment accordingly. Since the vehicles always have limited computing resources, processing these data in parallel with efficient task scheduling is one of the most important topics. Most current work focuses on formulating special scenarios and service requirements as optimization problems. However, the complicated and dynamic environment of vehicular computing is hard to model, predict and control, making those previous methods unscalable and unable to reflect the real scenario. In this paper, a Multi-task Deep reinforcement learning approach for scalable parallel Task Scheduling (MDTS) is firstly devised. For avoiding the curse of dimensionality when coping with complex parallel computing environments and jobs with diverse properties, we extend the action selection in Deep Reinforcement Learning (DRL) to a multi-task decision, where the output branches of multi- task learning are fine-matched to parallel scheduling tasks. Child tasks of a job are accordingly assigned to distributed nodes without any human knowledge while the resource competition among parallel tasks is leveraged through shared neural network layers. Moreover, we design an appropriate reward function to optimize multiple metrics simultaneously, with emphasis on specific scenarios. Extensive experiments show that the MDTS significantly increases the overall reward compared with least- connection scheduling and particle swarm optimization algorithm from -16.71, -0.67 to 2.93, respectively.
What problem does this paper attempt to address?