Text Normalization in Mandarin Text-to-speech System.

Yuxiang Jia,Dezhi Huang,Wu Liu,Yuan Dong,Shiwen Yu,Haila Wang
DOI: https://doi.org/10.1109/icassp.2008.4518704
2008-01-01
Abstract:Text normalization is an important component in text-to-speech system and the difficulty in text normalization is to disambiguate the non-standard words (NSWs). This paper develops a taxonomy of NSWs on the basis of a large scale Chinese corpus, and proposes a two-stage NSWs disambiguation strategy, finite state automata (FSA) for initial classification and maximum entropy (ME) classifiers for subclass disambiguation. Based on the above NSWs taxonomy, the two-stage approach achieves an F-score of 98.53% in open test, 5.23% higher than that of FSA based approach. Experiments show that the NSWs taxonomy ensures FSA a high baseline performance and ME classifiers make considerable improvement, and the two-stage approach adapts well to new domains.
What problem does this paper attempt to address?