Enhanced Pre-Trained Transformer with Aligned Attention Map for Text Matching

Qi Liu,Miaohui Zhang,Anan Zhang,Qi Huang,Jiang Wei,Xu Sun
DOI: https://doi.org/10.1109/icmlca63499.2024.10754171
2024-01-01
Abstract:In recent years, significant advancements have been made in the field of text matching through the utilization of large pre-trained models and extensive prior knowledge. However, these large models have not yet fully optimized the structure of the matching task. Persisting issues include inadequate short-sentence matching and a lack of robustness in this particular aspect of text matching. Our work is founded on a large pre-trained model and aims to enhance its performance in text matching tasks by modifying the mask methods within the Transformer architecture and augmenting the model's matching and alignment capabilities. Specifically, we have implemented two key improvements: Firstly, we introduced a novel alignment matching attention mechanism into the multi-head attention module of the Transformer. Subsequently, we introduced an additional mask matrix to differentiate characters requiring attention from those that do not, thereby enhancing the focus on matching information, which will reinforce the intermediate features of the alignment information. Building upon this, we propose a feature fusion method grounded in the gate mechanism, which combines the features computed by the original attention in the Transformer with the alignment-enhanced features to amplify the matching capability of the original pre-trained model. Our method has demonstrated enhanced performance on both the BERT and RoBERTa models, surpassing the original model across multiple datasets including QQP, QNLI, MNLI, and Sci-Tail.
What problem does this paper attempt to address?