Chinese NER Using Multi-View Transformer

Yinlong Xiao,Zongcheng Ji,Jianqiang Li,Mei Han
DOI: https://doi.org/10.1109/taslp.2024.3426287
2024-01-01
IEEE/ACM Transactions on Audio Speech and Language Processing
Abstract:Integrating lexical knowledge in Chinese named entity recognition (NER) has been proven effective. Among the existing methods, Flat-LAttice Transformer (FLAT) has achieved great success in both performance and efficiency. FLAT performs lexical enhancement for each sentence by constructing a flat lattice ( i.e. a sequence of tokens including the characters in a sentence and the matched words in a lexicon) and calculating self-attention with a fully-connected structure. However, the different interactions between tokens, which can bring different aspects of semantic information for Chinese NER, cannot be well captured by self-attention with a fully-connected structure. In this paper, we propose a novel Multi-View Transformer (MVT) to effectively capture the different interactions between tokens. We first define four views to capture four different token interaction structures. We then construct a view-aware visible matrix for each view according to the corresponding structure and introduce a view-aware dot-product attention for each view to limit the attention scope by incorporating the corresponding visible matrix. Finally, we design three different MVT variants to fuse the multi-view features at different levels of the Transformer architecture. Experimental results conducted on four public Chinese NER datasets show the effectiveness of the proposed method. Specifically, on the most challenging dataset Weibo, which is in an informal text style, MVT outperforms FLAT in F1 score by 2.56%, and when combined with BERT, MVT outperforms FLAT in F1 score by 3.03%.
What problem does this paper attempt to address?