Optimizing Camera Motion with MCTS and Target Motion Modeling in Multi-Target Active Object Tracking

Zheng Chen,Jian Zhao,Mingyu Yang,Wengang Zhou,Houqiang Li
DOI: https://doi.org/10.1145/3648369
2024-02-21
Abstract:In this work, we are dedicated to multi-target active object tracking (AOT), where the goal is to achieve continuous tracking of targets through real-time control of camera. This form of active camera control can be applied to unmanned aerial vehicles (UAV), intelligent robots, and sports events. Our work is conducted in an environment featuring multiple cameras and targets, where our goal is to maximize target coverage. Contrasting with previous research, our work introduces additional degrees of freedom for the cameras, allowing them not only to rotate but also to move along boundary lines. In addition, we model the motion of target to predict the future position of the target in environment. With target’s future position, we use Monte Carlo Tree Search (MCTS) method to find the optimal action of camera. Since the action space is large, we propose to leverage the action selection from multi-agent reinforcement learning (MARL) network to prune the search tree of Monte Carlo Tree Search method, so as to find the optimal action more efficiently. We establish a multi-target 2D environment to simulate several sports games, and experimental results demonstrate that our method can effectively improve the target coverage. The code is available at: http://github.com/HopeChanger/ActiveObjectTracking.
computer science, information systems, theory & methods, software engineering
What problem does this paper attempt to address?
This paper attempts to address the problem of continuous tracking of multiple targets in Active Object Tracking (AOT) by controlling the camera in real-time. Specifically, the authors focus on how to maximize target coverage in a multi-camera and multi-target environment. Unlike previous studies, this work introduces more degrees of freedom for the camera, allowing not only rotation but also movement along the boundary. Additionally, the authors model the motion of the targets to predict their future positions and use the Monte Carlo Tree Search (MCTS) method to find the optimal camera actions. To improve search efficiency, they propose using a Multi-Agent Reinforcement Learning (MARL) network to select actions for pruning the MCTS search tree. ### Main Contributions of the Paper: 1. **Extended the degrees of freedom for the camera**: Increased the positional freedom of the camera in multi-target AOT, making the camera setup closer to scenarios in actual sports events. 2. **Proposed the MARL network**: Predicted camera actions in a centralized setting, fully utilizing the information between cameras. 3. **Modeled target motion**: Predicted the future state of the environment and combined it with the MCTS method to obtain better actions based on future information. ### Research Background: - **Single-target Active Object Tracking**: Early research mainly focused on single-target tracking, usually divided into two stages: target detection and camera control. In recent years, with the development of deep learning, passive target tracking has made significant progress, but the two-stage method still has bottlenecks in dealing with occlusion and camera control errors. - **Multi-target Active Object Tracking**: When there are multiple targets in the environment, it is difficult for one camera to track all targets simultaneously, thus requiring an evaluation of the coverage of multiple cameras. Related studies include target coverage enhancement in directional sensor networks and the application of multi-agent reinforcement learning in multi-target tracking. ### Method Overview: 1. **2D Environment**: Designed a 2D environment simulating sports event scenarios, where cameras can move along the boundary of the field and rotate freely to achieve real-time tracking. 2. **Position Prediction Module**: Used current and historical observation states to predict future environmental states, providing a basis for Monte Carlo Tree Search. 3. **Multi-Agent Reinforcement Learning Module**: Pruned the search tree by actions output from the MARL network to improve search efficiency. 4. **Monte Carlo Tree Search Module**: Searched for camera actions based on current and future environmental states to find the optimal actions. ### Experimental Results: The authors established a 2D environment to simulate various sports event scenarios. Experimental results show that their method can effectively improve target coverage. ### Conclusion: By introducing more degrees of freedom for the camera, modeling target motion, and using multi-agent reinforcement learning to prune the search tree, this paper effectively addresses the target coverage problem in multi-target active object tracking.