A survey of methods for addressing the challenges of referring image segmentation

Lixia Ji,Yunlong Du,Yiping Dang,Wenzhao Gao,Han Zhang
DOI: https://doi.org/10.1016/j.neucom.2024.127599
IF: 6
2024-05-01
Neurocomputing
Abstract:Referring image segmentation is guided by natural language descriptions to separate the target objects in an image. This task is different from semantic segmentation and instance segmentation in that it involves unique challenges such as multimodal information fusion, variability of natural language expressions, and model robustness. In recent years, the emergence of deep learning techniques has led to innovative ideas and methods for solving these problems. We systematically analyze the main challenges of referring image segmentation and summarize the existing solutions. These include strategies such as multimodal fusion, expression query, multimodal pre-training, and robustness. In addition, we provide an overview of several datasets commonly used in referring image segmentation and analyze the performance of various representative approaches in comparison to different datasets, visual backbone models and threshold settings. Our focus also extends to the challenges and future developments in the field of referring image segmentation. Our survey paper will provide a comprehensive technical reference for future researchers.
computer science, artificial intelligence
What problem does this paper attempt to address?