On Continuous Speech Recognition of Indian English

Xin Jin,Keliang Zhang,Xian Huang,Min Miao
DOI: https://doi.org/10.1145/3302425.3302489
2018-12-21
Abstract:Indian English (IE) derives from British English, but they differ in many aspects. It varies from speech to word, and is a typical variety of English. At present, it is hard to see researches on continuous speech recognition (CSR) of lesser-known English varieties such as Indian English. Indian English has developed some distinctive features of its own with phonological features being the most remarkable. Compared with British English (or American English) which has a large amount of annotated speech data, IE is a relatively low-resourced language. What's more, the performance of existing English CSR systems perform unsatisfactorily when dealing with IE spontaneous conversations. To date, CSR of low-resourced languages (minority languages, dialects and varieties of a certain language with relatively few annotated speech data) performs unsatisfactorily. This paper takes Indian English for example, focuses on CSR of IE under low-resourced conditions, extracts acoustic features and trains acoustic models with different methods, explores the effective method in recognizing low-resourced languages. Firstly, we employ MFCC and PLP to extract features respectively and choose the GMM-HMM acoustic model to build the baseline system. Secondly, it comes up with another 6 acoustic models while using BLSTM-RNN and TDNN neural network algorithms. Then the models are tested on the test set and results are analyzed in detail. Thirdly, we use an acoustic model of American English trained by a large number of annotated data which is then transferred to the optimization of the acoustic model of Indian English. Finally, the decoding test is carried out on the test set and generates the experimental analysis. According to the experimental results, we find that the performances of the two neural network recognition systems are both improved with transfer learning technology while the BLSTM-RNN is more remarkable.
What problem does this paper attempt to address?