Comparison of Two Cross-lingual AF Extraction Methods

Shixuan Du,Qingran Zhan,Yahui Shan,Xiang Xie
DOI: https://doi.org/10.1109/icicsp48821.2019.8958606
2019-01-01
Abstract:In this paper we propose two different cross-lingual articulatory features (AFs) extraction methods and build recognition systems based on cross-lingual AFs. The AF extractors are trained from source language (English) and cross-lingual AFs are generated for the target language (Mandarin) using the trained extractors. Experiments are carried with two kinds of AFs extraction architectures, mutilayer perception (MLP) and the Bidirectional Long Short-Term Memory (BLSTM) based connectionist temporal classification (CTC). The MLP architectures requires frame-level AF label which converted by phone alignment obtained from GMM-HMM using Phone-to-AF mapping, while the BLSTM-based CTC eliminates the need for alignments. The Mandarin speech recognition system is built by the joint features which are concatenated with AFs and MFCC. The results show that the using of cross-lingual AFs can improve the performance of ASR task on THCHS-30. Among two architectures, cross-lingual AFs extracted using BLSTM-based CTC gives better recognition performance.
What problem does this paper attempt to address?