Research on Text Segmentation Based on Topic Analysis

刘铭,王晓龙,刘远超
DOI: https://doi.org/10.3321/j.issn:0372-2112.2009.02.007
2009-01-01
Abstract:A novel topic segmentation algorithm is proposed in this paper.This algorithm first partitions text into some blocks.After that it constructs whole-length lexical chains to analyze multiple subtopics of this text.By constructing graph which describes blocks covering subtopics,the similar blocks which describe same subtopic can be classified.In order to solve the situations that segmentation points drop inside blocks,it segments blocks again.Experiment results demonstrate that by analyzing topic of text,this algorithm can remove interferences,which are aroused by irrelative features,from segmentation results.By constructing graph which describes blocks covering subtopics,it can mix similarities of adjacent and disconnected blocks together,and increases segmentation precision.The second segmentation makes segmentation results more reasonable.
What problem does this paper attempt to address?