Breaking Through the Noisy Correspondence: A Robust Model for Image-Text Matching
Haitao Shi,Meng Liu,Xiaoxuan Mu,Xuemeng Song,Yupeng Hu,Liqiang Nie
DOI: https://doi.org/10.1145/3662732
IF: 4.657
2024-04-29
ACM Transactions on Information Systems
Abstract:Unleashing the power of image-text matching in real-world applications is hampered by noisy correspondence. Manually curating high-quality datasets is expensive and time-consuming, and datasets generated using diffusion models are not adequately well-aligned. The most promising way is to collect image-text pairs from the Internet, but it will inevitably introduce noisy correspondence. To reduce the negative impact of noisy correspondence, we propose a novel model that first transforms the noisy correspondence filtering problem into a similarity distribution modeling problem by exploiting the powerful capabilities of pre-trained models. Specifically, we use the Gaussian Mixture model to model the similarity obtained by CLIP as clean distribution and noisy distribution, to filter out most of the noisy correspondence in the dataset. Afterward, we used relatively clean data to fine-tune the model. To further reduce the negative impact of unfiltered noisy correspondence, i.e., a minimal part where two distributions intersect during the fine-tuning process, we propose a distribution-sensitive dynamic margin ranking loss, further increasing the distance between the two distributions. Through continuous iteration, the noisy correspondence gradually decreases and the model performance gradually improves. Our extensive experiments demonstrate the effectiveness and robustness of our model even under high noise rates.
computer science, information systems