A Hybrid Strategy for Chat Transcript Summarization

Pratik K. Biswas
2024-07-31
Abstract:Text summarization is the process of condensing a piece of text to fewer sentences, while still preserving its content. Chat transcript, in this context, is a textual copy of a digital or online conversation between a customer (caller) and agent(s). This paper presents an indigenously (locally) developed hybrid method that first combines extractive and abstractive summarization techniques in compressing ill-punctuated or un-punctuated chat transcripts to produce more readable punctuated summaries and then optimizes the overall quality of summarization through reinforcement learning. Extensive testing, evaluations, comparisons, and validation have demonstrated the efficacy of this approach for large-scale deployment of chat transcript summarization, in the absence of manually generated reference (annotated) summaries.
Computation and Language
What problem does this paper attempt to address?
The paper attempts to address the problem of generating high-quality automatic summaries for large-scale chat logs in the absence of human-generated reference (annotated) summaries. Specifically, the paper proposes a hybrid approach that combines extractive and abstractive summarization techniques and optimizes the overall summary quality through reinforcement learning to produce summaries that are easy to read and have correct punctuation. This method is particularly optimized for the characteristics of customer service chat logs of telephone companies, including issues such as non-continuity of conversations, varying lengths, grammatical errors, and missing punctuation. Additionally, existing open-source summarization tools perform poorly when processing chat logs, thus necessitating a domain-specific chat summarizer to meet the needs of telephone companies.