S UMM N : A Multi-Stage Summarization Framework for Long Input Dialogues and Documents
Mingda Chen,Zewei Chu,Sam Wiseman,Zhi Chen,Lu Chen,Zihan Xu,Yanbin Zhao,Su Zhu,Kai Yu,Arman Cohan,Franck Dernoncourt,Doo Soon Kim,Trung Bui,Seokhwan Kim,Walter Chang,Alexander Fabbri,Faiaz Rahman,Imad Rizvi,Borui,Haoran Wang,Yashar Li,Mehdad Dragomir,Sebastian Gehrmann,Yuntian Deng,Iain McCowan,Jean Carletta,Wessel Kraaij,S. Bourban,M. Flynn,M. Guillemot
2021-01-01
Abstract:Text summarization helps readers capture 001 salient information from documents, news, in-002 terviews, and meetings. However, most state-003 of-the-art pretrained language models (LM) 004 are unable to efficiently process long text 005 for many summarization tasks. In this pa-006 per, we propose S UMM N , a simple, flexi-007 ble, and effective multi-stage framework for 008 input texts that are longer than the maxi-009 mum context length of typical pretrained LMs. 010 S UMM N first splits the data samples and gener-011 ates a coarse summary in multiple stages and 012 then produces the final fine-grained summary 013 based on it. Our framework can process in-014 put text of arbitrary length by adjusting the 015 number of stages, while keeping the LM in-016 put size fixed. Moreover, it can deal with 017 both single-source documents and dialogues, 018 and it can be used on top of different back-019 bone abstractive summarization models. To 020 the best of our knowledge, S UMM N is the 021 first multi-stage split-then-summarize frame-022 work for long input summarization. Our ex-023 periments demonstrate that S UMM N outper-024 forms previous state-of-the-art methods by im-025 proving ROUGE scores on three long meet-026 ing summarization datasets AMI, ICSI, and 027 QMSum, two long TV series datasets from 028 SummScreen, and a long document summa-029 rization dataset GovReport. Our data and code 030 are available at https://github.com/ 031 ANONYMOUS/Summ-N . 032