Enhancing Visual Monitoring Via Multi-Feature Fusion and Template Update Strategies

Fahad Rafique,Liying Zheng,Acheraf Benarab,Muhammad Hafeez Javed
DOI: https://doi.org/10.1007/s11760-024-03526-1
2024-01-01
Abstract:Recent advancements in computer vision, particularly deep learning, have significantly influenced visual monitoring across varied scenes. However, traditional machine learning approaches, particularly those based on correlation filtering (CF), remain valuable due to their efficiency in data collection, lower computational needs and improved explain ability. While CF-based tracking methods have become popular for analyzing complex scenes, they often rely on single features, limiting their ability to capture dynamic target appearances and resulting in inaccurate target tracking. Traditional template update techniques might also result in low accuracy and inaccurate feature extraction. In contrast, we introduces a location fusion mechanism incorporating multiple feature information streams to improve real-time monitoring in complex scenes. These strategies periodically extract four types of features and fuse their response maps, ensuring robust target tracking with high accuracy. Further innovations, such as dynamic spatial regularization and a multi-memory tracking framework, enable filters to focus on more reliable regions and suppress response deviations across consecutive frames. On the basis of confidence score a novel template update, storage and retrieval mechanism is implemented. Extensive testing across datasets like OTB100, VOT2016 and VOT2018 confirms that these integrated approaches outperform 26 state-of-the-art algorithms by balancing tracking success and computational efficiency in complex scenes.
What problem does this paper attempt to address?