Learning effective feature representation for video object segmentation via memory

Jun Li,Lijuan Sun,Hengyi Ren,Ying Cao,Suya Li,Xin Xie
DOI: https://doi.org/10.1016/j.knosys.2024.112020
IF: 8.139
2024-06-02
Knowledge-Based Systems
Abstract:To solve the problem of target discrimination being ignored in developing the feature of the current frame, this paper proposes the effective feature representation via memory (EFRM) method to form the effective and discriminative feature representation of the current frame from global and local perspectives by fully benefiting from the rich information contained in the memorized frames. First, the global discriminative feature, representing the differences between the foreground and background, is generated through the space–time memory read (STMR) with the nonlocal matching scheme. Second, the local discriminative feature, which has feature differences among targets that appear in the current frame, is generated by the designed per-object memory enhancement (PoME) by relying only on the diverse representations of the targets shown in the memorized frames. Finally, the segmentation of the current frame is generated based on the effective feature representation formed by the concatenation of global and local discriminative features. Evaluations on DAVIS 16, 17 and YouTube-VOS 18, 19 demonstrate the competitive performance of the proposed method.
computer science, artificial intelligence
What problem does this paper attempt to address?