Data mining Mandarin tone contour shapes

Shuo Zhang
DOI: https://doi.org/10.48550/arXiv.1907.01668
2019-07-02
Computation and Language
Abstract:In spontaneous speech, Mandarin tones that belong to the same tone category may exhibit many different contour shapes. We explore the use of data mining and NLP techniques for understanding the variability of tones in a large corpus of Mandarin newscast speech. First, we adapt a graph-based approach to characterize the clusters (fuzzy types) of tone contour shapes observed in each tone n-gram category. Second, we show correlations between these realized contour shape types and a bag of automatically extracted linguistic features. We discuss the implications of the current study within the context of phonological and information theory.
What problem does this paper attempt to address?