Text Normalization in Chinese Text-to-Speech System

贾玉祥,黄德智,刘武,俞士汶
DOI: https://doi.org/10.3969/j.issn.1003-0077.2008.05.006
2008-01-01
Abstract:Chinese text normalization is the process of transforming non-Chinese character strings into their corresponding Chinese character strings to determine their pronunciations.The difficulties of this work mainly lie in two aspects: too many non-Chinese character strings of various formats and their high degree of ambiguities.This paper develops an effective taxonomy of non-Chinese character strings with the concept of Non-Standard Words(NSWs).And then a three-layer normalization model is proposed,including NSWs detection,NSWs disambiguation and standard words generation.In the NSWs disambiguation stage,a machine learning method is employed to overcome shortcomings of rule-based method.Experiment results show that this approach achieves a high performance and adapts well to new domains.The accuracy of open test is 98.64%.
What problem does this paper attempt to address?