GiTS: Gist-driven Text Segmentation

Yifeng Ding,Yimeng Dai,Hai-Tao Zheng,Rui Zhang
DOI: https://doi.org/10.1109/ijcnn55064.2022.9892668
2022-01-01
Abstract:Text segmentation is a classical task in NLP that aims to split a document into several coherent sections. Existing work typically regards it as a binary classification task and predicts whether a paragraph or sentence is the end of a section through bottom-up paragraph content coherence modeling. However, the high-level gist, i.e., the essential meaning of a certain section, is not well utilized. In this work, we propose a gist-driven text segmentation model (GiTS), which locates section boundaries utilizing both the top-down section gist and the bottom-up paragraph contents. Specifically, GiTS consists of a gist generator and a boundary predictor. The RNN-based gist generator generates section gist in order to consider high-level content coherence and distinguishability. The boundary predictor predicts boundary through a pointer network. We also propose an auxiliary gist processor module, which can seamlessly extend GiTS to diverse NLP tasks, such as heading generation, question answering and text classification. Experimental results on text segmentation show our model outperforms the RNN-based stateof-the-art model by 6.83 % on average in terms of P-k. We also show that combining GiTS with the auxiliary gist processor outperforms the state-of-the-art heading generation models by 22.68% in accuracy and 44.20% in Rouge-1.
What problem does this paper attempt to address?