G2P based on multi-sensing field and modified focus loss

Jinbo Zhang,Donghong Qin,Yang Li,Xiao Liang
DOI: https://doi.org/10.1145/3524304.3524314
2022-01-01
Abstract:Text-to-phoneme is an important part of speech synthesis, and disambiguation of Chinese polyphonic characters is one of the biggest challenges. This paper proposes a BiLSTM architecture to solve the task of disambiguating Chinese polyphonic characters. The Multi-Receptive Field Fusion module is used to increase the model's receptive field to improve the model's ability to obtain contextual information. It also introduces a modified focus loss to solve polyphonic data. For the problem of unbalanced distribution, the experimental results show that the proposed architecture can achieve 97.69% accuracy, and the convergence speed is better than other G2P models.
What problem does this paper attempt to address?