DSWA: A Dilated Shift Window Attention Method for Chinese Named Entity Recognition (S).

Xinyu Hou,Cui Zhu,Wenjun Zhu
DOI: https://doi.org/10.18293/dmsviva2023-106
2023-01-01
Abstract:In recent times, numerous models tried to enhance the performance of Transformer on Chinese NER tasks.The model can be enhanced in two ways: one is combining it with lexicon augmentation techniques, the other is optimizing the Transformer model itself.According to research, fully connected self-attention may scatter the attention distribution, which is the reason for worse performance of the original Transformer with self-attention.In this paper, we attempt to optimize the Transformer model especially attention layer.Therefore, a novel attention mechanism, Dilated Shift Window Attention, is proposed to address this problem.By using Window Attention, this method improves the model's capacity to deal local information, meanwhile, the model can still manage long text and long-distance dependencies owing to the Window Dilatation mechanism.Experiments on various datasets also show that DSWA replacing fully connected self-attention improves the model's performance on the Chinese NER task.
What problem does this paper attempt to address?