CSMA-CNER:Multi-modal Chinese NER Task with Cross- and Self-Modality Attention

Bo Kong,Shengquan Liu,Liang He,Liruizhi Jia,Yi Liang
DOI: https://doi.org/10.1109/icme57554.2024.10688285
2024-01-01
Abstract:Many scholars have employed dictionary-based word enhancement methods and multimodal information supplementation techniques to improve Chinese Named Entity Recognition models. However, these approaches primarily rely on static weights between different modalities, leading to a failure in capturing fine-grained correlations within the text modality and across modalities. As a result, they do not fully exploit multimodal information, leading to a loss of valuable data. To overcome these limitations, this paper proposes the Cross- and Self-Modality Attention network. This network dynamically captures correlations within the text modality and across modalities at multiple levels, effectively enhancing multimodal mutual information and reducing information loss. Additionally, we introduce two CNN structures to extract glyph visual and phonetic information. We conducted extensive experiments on Weibo, Resume, Ontonotes 4.0, and MSRA, and the results demonstrate that our approach outperforms state-of-the-art (SOTA) baseline methods.
What problem does this paper attempt to address?