Segmentation of Chinese Discourse in Content-Based Information Retrieval.

Samuel W. K. Chan,Benjamin Ka-Yin T'sou
DOI: https://doi.org/10.5555/2835865.2835887
2000-01-01
Abstract:In this paper, we present a novel approach in automatic discourse segmentation without a full semantic understanding. In order to analyse the textual bonds and determine the degree of coherence that a discourse may exhibit, we first represent the tremendous diversity of textual relations into a discourse network. A set of mutual linguistic constraints that largely determines the similarity of meaning among lexical items is encoded. Topic boundaries in a discourse are identified through a computational method which identifies the segment cluster from a higher order structure in the discourse network. Our segmentation is regarded as a process of identifying the shifts from one segment cluster to another. Experimental results show that our formulation is capable to address the topic shifts of texts. Comparison with a related method demonstrates that the combination of constraints is closely related, to the topic boundaries among textual segments. Evaluation using recall and precision shows the effectiveness of our approach in a collection of Chinese newswire articles.
What problem does this paper attempt to address?