Using quantitative context relevance analysis for text segmentation

Maosheng Zhong,Hui Liu,Lei Liu
2009-01-01
Journal of Computational Information Systems
Abstract:Text segmentation technique can divide a text into topic-coherent sections. After segmenting, some Natural Language Processing tasks, such as Text Classification, Text Summarization, Information Retrieval or Q-A program etc., will be more easier to process. However, two key problems of text segmentation are how to identify whether there exists relevance or not in the context of a text and how to apply the result of context relevance analysis to detect topic-breaks between different topic-sections. In this paper, we presented a practical method to measure the relevance of context based on Quantified Conceptual Relations of word-pairs, extracted from Modern Chinese Standard Dictionary. We built a scoring model, which can calculate the score of gap point between sentences by using the quantitative relevance of context, to implement the sentence-level text segmentation. The experimental results show that Mean Error Rate pk and Minimum Error Rate pk of our method for identifying the segment boundaries are the lowest in the state-of-the-art methods for Chinese text segmentation. 1553-9105/ Copyright © 2009 Binary Information Press.
What problem does this paper attempt to address?