Discourse parsing of sociology dissertation abstracts using decision tree induction

Shiyan Ou,Hui Ying Heng,Dion Hoe-Lian Goh,Christopher S. G. Khoo
DOI: https://doi.org/10.7152/acro.v14i1.14114
2003-01-01
Advances in Classification Research Online
Abstract:In this study, we investigated the use of decision tree induction to parse the macro-level discourse structure of sociology dissertation abstracts. We treated discourse parsing as a sentence categorization task. The attributes used in constructing the decision tree models were stemmed words that occurred in at least 35 sentences (out of 3694 sentences in 300 sample abstracts). Sentence location information was also used. The model obtained an accuracy rate of 71.3% when applied to a test sample of 100 abstracts. Another model that made use of information regarding the presence of 31 indicator words in neighboring sentences was also developed. Although this model did not obtain better results, a comparison of the two models suggests that an improvement in the classification of sentences in problem statement and research method section is possible by combining the models.
What problem does this paper attempt to address?