An Interactively Motion-Assisted Network for Multiple Object Tracking in Complex Traffic Scenes
Zhiqi Shen,Kaiquan Cai,Peng Zhao,Xiaoyan Luo
DOI: https://doi.org/10.1109/tits.2023.3316691
IF: 8.5
2024-01-01
IEEE Transactions on Intelligent Transportation Systems
Abstract:Multiple object tracking plays a crucial role in intelligent transportation systems. Due to the varying size, fast motion, and occlusion of traffic objects, multiple object tracking in complex traffic scenes is prone to low tracking accuracy and tracking fragmentation. To solve these problems, numerous tracking methods have been proposed based on object discrimination feature. However, these methods neglect the guiding prior role of historical tracking information for detection, making them not applicable to more complex scenes, such as extremely small, fast-moving, and severely obscured objects. In this paper, we propose an Interactively Motion-assisted Network (IMANet) for multiple object tracking in complex traffic scenes. First, to capture the object motion patterns from historical frames, an object motion modeling module considering camera movement is proposed, which particularly has prominent advantages on videos obtained by moving cameras. Next, a multi-scale fusion detection and embedding module is designed to incorporate historical motion information, thereby improving detection performance. Finally, multiple object tracking can be achieved by associating objects detected in different frames based on detection and embedding results. The proposed method combines the detection and tracking in an interactive way, where detection performance is facilitated using historical information provided by the tracking module, and better detection in turn enhances the tracking. Several real-world traffic examples are used to illustrate the performance of the proposed method in both detecting and tracking traffic objects. The results demonstrate that the proposed method outperforms the state-of-the-art methods, especially in complex surveillance videos with varying sizes and occlusions.