LabCLIP: Label-Enhanced Clip for Improving Zero-Shot Text Classification.

Yongheng Zhang,Peng Wang,Qiguang Chen,Jingxuan Zhou,Yongmei Wang,Min Li,Libo Qin
DOI: https://doi.org/10.1109/ICASSP48485.2024.10446865
2024-01-01
Abstract:Zero-shot text classification aims to handle the text classification task without any annotated training data, which can greatly alleviate the data scarcity problem. Current dominant approaches follow a novel text-image matching paradigm, reformulating zero-shot text classification into a text-image matching problem, which can capture the visual image information and show promising performance. Nevertheless, existing text-image matching approaches solely focus on the visual image information, ignoring the semantic knowledge embedded in the text labels. To address the challenge, in the work, we present a label-enhanced CLIP framework (Lab-CLIP) for zero-shot text classification to consider both the visual image and text label semantic information simultaneously. Specifically, LabCLIP first converts the label into the corresponding image, and then injects the text label into the corresponding label image to explicitly capture the label semantic knowledge. We conduct experiments on 8 publicly available zero-shot text classification datasets and experimental results indicate that LabCLIP outperforms previous approaches on all datasets (with 4.3% improvement on average). In addition, we provide extensive analysis on exploring how to effectively incorporate the text label information.
What problem does this paper attempt to address?