Looking at discourse in a corpus: The role of lexical cohesion

Tony Berber Sardinha
DOI: https://doi.org/10.48550/arXiv.cs/0004016
2000-04-28
Abstract:This paper is aimed at reporting on the development and application of a computer model for discourse analysis through segmentation. Segmentation refers to the principled division of texts into contiguous constituents. Other studies have looked at the application of a number of models to the analysis of discourse by computer. The segmentation procedure developed for the present investigation is called LSM ('Link Set Median'). It was applied to three corpus of 300 texts from three different genres. The results obtained by application of the LSM procedure on the corpus were then compared to segmentation carried out at random. Statistical analyses suggested that LSM significantly outperformed random segmentation, thus indicating that the segmentation was meaningful.
Computation and Language
What problem does this paper attempt to address?