Two-stage Encoding Extractive Summarization

Wenying Guo,Bin Wu,Bai Wang,Yuanyu Yang
DOI: https://doi.org/10.1109/dsc50466.2020.00060
2020-01-01
Abstract:Pre-trained language model can express the semantics of word or text span, is widely applied in many NLP tasks, and text summarization is no exception. It is created using fine-tuning or feature-based method on pre-training model. Since Bidirectional Encoder Representations from Transformers (BERT; Devlin et al. 2019), many works model text summarization based on BERT, and fine tune all the parameters end-to-end. Notably, multiple research proposed different strategies to create enhanced versions of BERT further, which achieve the state-of-the-art performance in many NLP tasks. In this paper, we explore the potential of multiple versions of BERT to handle text summarization. We present a two-stage encoder model (TSEM) for extractive summarization. The first stage applies A Lite BERT (ALBERT; Lan et al. 2019) to secure sentence-level embedding, identify valuable content based on A Lite BERT (ALBERT; Lan et al. 2019). The second stage proposes a new strategy to fine-tune BERT deriving meaningful document embedding, then select the best-matched combination of important sentences with source document to compose summarization. Experimental result on the CNN/Daily Mail dataset demonstrates that our model is competitive with the state-of-the-art result.
What problem does this paper attempt to address?