Tipster: A Topic-Guided Language Model for Topic-Aware Text Segmentation.

Zheng Gong,Shiwei Tong,Han Wu,Qi Liu,Hanqing Tao,Wei Huang,Runlong Yu
DOI: https://doi.org/10.1007/978-3-031-00129-1_14
2022-01-01
Abstract:The accurate segmentation and structural topics of plain documents not only meet people's reading habit, but also facilitate various downstream tasks. Recently, some works have consistently given positive hints that text segmentation and segment topic labeling could be regarded as a mutual task, and cooperating with word distributions has the potential to model latent topics in a certain document better. To this end, we present a novel model namely Tipster to solve text segmentation and segment topic labeling collaboratively. We first utilize a neural topic model to infer latent topic distributions of sentences considering word distributions. Then, our model divides the document into topically coherent segments based on the topic-guided contextual sentence representations of the pre-trained language model and assign relevant topic labels to each segment. Finally, we conduct extensive experiments which demonstrate that Tipster achieves the state-of-the-art performance in both text segmentation and segment topic labeling tasks.
What problem does this paper attempt to address?