KATSum: Knowledge-aware Abstractive Text Summarization

Guan Wang,Weihua Li,Edmund Lai,Jianhua Jiang
DOI: https://doi.org/10.48550/arXiv.2212.03371
2022-12-07
Abstract:Text Summarization is recognised as one of the NLP downstream tasks and it has been extensively investigated in recent years. It can assist people with perceiving the information rapidly from the Internet, including news articles, social posts, videos, etc. Most existing research works attempt to develop summarization models to produce a better output. However, advent limitations of most existing models emerge, including unfaithfulness and factual errors. In this paper, we propose a novel model, named as Knowledge-aware Abstractive Text Summarization, which leverages the advantages offered by Knowledge Graph to enhance the standard Seq2Seq model. On top of that, the Knowledge Graph triplets are extracted from the source text and utilised to provide keywords with relational information, producing coherent and factually errorless summaries. We conduct extensive experiments by using real-world data sets. The results reveal that the proposed framework can effectively utilise the information from Knowledge Graph and significantly reduce the factual errors in the summary.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the infidelity and factual error problems existing in the existing abstractive text summarization models in text summarization generation. Specifically, the existing abstractive summarization models may deviate from the main idea of the source text when generating summaries, fail to retain key information, or generate content that does not conform to the facts of the source text. These problems may mislead readers. To address these challenges, the author proposes a new model - the Knowledge - aware Abstractive Text Summarization (KATSum). This model takes advantage of the Knowledge Graph (KG) to enhance the standard Sequence - to - Sequence (Seq2Seq) model. By extracting Knowledge Graph triples from the source text and using these triples to provide keywords with relationship information, KATSum can generate coherent and factually correct summaries. Experimental results show that the proposed framework can effectively utilize the information in the Knowledge Graph and significantly reduce factual errors in the summaries. ### Specific problem description: 1. **Infidelity**: The generated summary content is far from the main idea of the source text and fails to retain important information. 2. **Factual error**: The generated summary contains information that does not conform to the facts of the source text. ### Solutions: - **Introduction of Knowledge Graph**: By constructing a Knowledge Graph, extract key triples (head entity, relation, tail entity) from the source text and use the relationship information provided by these triples in the summary generation process. - **Use of classifier**: Use a trained classifier to identify which triples should be included in the summary, thereby reducing the influence of noise information. - **Fusion of encoder and decoder**: Combine pre - trained language models (such as BERT) and Knowledge Graph embeddings to generate high - quality summaries. ### Experimental verification: - **Dataset**: Two real - world datasets, CNN/Daily Mail and XSum, are used for the experiment. - **Evaluation metric**: Use ROUGE scores as evaluation metrics, including ROUGE - 1, ROUGE - 2 and ROUGE - L. - **Experimental result**: The experimental results show that KATSum outperforms the baseline models on both datasets, especially when XLNet is used as the encoder, the performance improvement is particularly significant. Through these methods, KATSum effectively solves the infidelity and factual error problems existing in the existing abstractive text summarization models and improves the quality of the generated summaries.