Multi-Involution Memory Network for Unsupervised Video Object Segmentation

Jialing Lin,Bo Li
DOI: https://doi.org/10.1109/ijcnn60899.2024.10649910
2024-01-01
Abstract:Unsupervised Video Object Segmentation (UVOS) aims to autonomously recognize and segment primary foreground objects within a given video sequence without additional inputs. Current methods often rely on motion information, such as optical flow, between adjacent frames, which is fused with appearance features. However, this approach overlooks the latent deep object representations within the features and neglects the inter-frame connectivity over longer distances. In this paper, we present a pioneering solution, the Multi-Involution Memory Network (MIMN), designed to overcome these challenges. Specifically, we introduce the Multi-Involution Selective Kernel (MISK), a mechanism for aggregating deep representation features across various receptive fields. To facilitate contextual information exchange between frames at different distances, our proposed Inter Memory Aggregation (IMA) incorporates a novel temporal attention mechanism. This mechanism selectively identifies the most crucial features in the memory bank, enhancing the prediction accuracy for the current frame. Extensive empirical studies conducted on multiple benchmarks demonstrate the promising performance and high efficiency of the proposed MIMN method. The integration of MISK and IMA contributes to improved UVOS outcomes, addressing limitations present in existing approaches.
What problem does this paper attempt to address?