A Comprehensive Survey on Process-Oriented Automatic Text Summarization with Exploration of LLM-Based Methods

Hanlei Jin,Yang Zhang,Dan Meng,Jun Wang,Jinghua Tan
2024-03-05
Abstract:Automatic Text Summarization (ATS), utilizing Natural Language Processing (NLP) algorithms, aims to create concise and accurate summaries, thereby significantly reducing the human effort required in processing large volumes of text. ATS has drawn considerable interest in both academic and industrial circles. Many studies have been conducted in the past to survey ATS methods; however, they generally lack practicality for real-world implementations, as they often categorize previous methods from a theoretical standpoint. Moreover, the advent of Large Language Models (LLMs) has altered conventional ATS methods. In this survey, we aim to 1) provide a comprehensive overview of ATS from a ``Process-Oriented Schema'' perspective, which is best aligned with real-world implementations; 2) comprehensively review the latest LLM-based ATS works; and 3) deliver an up-to-date survey of ATS, bridging the two-year gap in the literature. To the best of our knowledge, this is the first survey to specifically investigate LLM-based ATS methods.
Artificial Intelligence
What problem does this paper attempt to address?
This paper focuses on the field of Automatic Text Summarization (ATS), with a particular emphasis on a comprehensive investigation from a process-oriented perspective, and explores approaches based on Large Language Models (LLMs). Current ATS research is often classified into theoretical categories, such as extractive or generative, but these classifications may not fully align with practical applications. With the development of LLMs, ATS methods may undergo changes. The main objectives of the paper are as follows: 1. Provide a comprehensive overview of ATS based on a "process-oriented pattern" to better adapt to practical application requirements. 2. Review and summarize the latest applications of LLMs in ATS. 3. Provide the latest investigation into ATS, bridging the research gap of the past two years, which is the first dedicated survey specifically focused on LLM-based ATS methods. The paper points out that with the rapid development of the internet, the emergence of large amounts of textual data has made ATS a key technology for addressing information processing problems. Although there have been many surveys on ATS methods, they often classify from a theoretical perspective. This paper organizes the content according to the implementation process of ATS, including data acquisition, preprocessing, modeling methods, and evaluation metrics, to provide more practical guidance. In addition, the paper discusses the impact of LLMs on ATS, as these models can significantly improve the accuracy and coherence of summaries. The paper reviews existing open-source datasets, analyzes their characteristics, and explores methods for creating new datasets, including rule-based and LLM-based annotation techniques. Finally, the paper summarizes the challenges and limitations in the ATS field, providing directions for future research. The entire study aims to provide a comprehensive roadmap for ATS to assist researchers and engineers in better understanding and applying relevant technologies.