Discrimination of Chinese quantitative style features based on text clustering

Hou Renkui,Jiang Minghu
DOI: https://doi.org/10.1109/ICoSP.2012.6492018
2012-01-01
Abstract:The styles of “News Broadcast” and “Qiang Qiang Conversation between Three Individuals” are different. The former is broadcasting, while the latter is conversational. This paper collects the corpus of both programs and selects sentence length, word length and sentence-initial word POS as the characters to generate the text vectors. And the texts are clustered by the Euclidean distance and ward algorithm. The analysis showed that the sentence length, word length and sentence-initial word POS can be used as Chinese quantitative stylistic characters.
What problem does this paper attempt to address?