Improved hybrid text summarization system using deep contextualized embeddings and statistical features
Mahak Gambhir,Vishal Gupta
DOI: https://doi.org/10.1007/s11042-024-19524-x
IF: 2.577
2024-06-14
Multimedia Tools and Applications
Abstract:In this digital world where an enormous volume of textual material is growing on the internet every single day, there is a great need for systems that can produce human-like summaries automatically. In the past, a number of extractive text summarization methods have been proposed that either used statistical or semantic techniques. Therefore, we have developed a novel hybrid model of text summarization, AttSum-Hybrid that takes into consideration language context and relationship between the text as well as captures structural information of the sentences while creating an extractive summary of the document. This hybrid summarization framework combines a deep learning-based contextual model with the statistical feature-based model. BERT (Bidirectional Encoder Representations from Transformers) is used as a feature extractor in the contextualized representation model. For learning semantic and syntactic relationships from the textual sequence, this model also uses a Convolutional Bi-LSTM network (Bidirectional Long Short Term Memory). On the other hand, we have developed a statistical feature representation framework that incorporates a few better-performing sentence scoring features. With the Daily Mail corpus, ROUGE recall scores for R-1, R-2, R-L measures are generated as 38.67, 15.37, 34.12, respectively whereas, with DUC 2002 dataset, ROUGE scores are computed as 59.74, 28.80, 57.87, respectively. Full-length ROUGE F1 scores of 43.15, 20.08, and 39.55 have been obtained by carrying out experiments with the combined CNN/Daily Mail dataset. Overall, our proposed hybrid model of summarization has achieved improved results as compared to the state-of-the-art baseline techniques. These results have been further validated by two summary validation tasks (Question answering task and Number of common sentences in the extract) performed by three human experts.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering