An exploration of semi-supervised and language-adversarial transfer learning using hybrid acoustic model for hindi speech recognition

Ankit Kumar,Rajesh Kumar Aggarwal
DOI: https://doi.org/10.1007/s40860-021-00140-7
2021-04-08
Journal of Reliable Intelligent Environments
Abstract:Semi-supervised training and language adversarial transfer learning are two different techniques to improve the Automatic Speech Recognition (ASR) performance in limited resource conditions. In this paper, we combined these two techniques and presented a common framework for the Hindi ASR system. For acoustic modeling, we proposed a hybrid architecture of SincNet-Convolutional Neural Network (CNN)-Light Gated Recurrent Unit (LiGRU), which shows increased interpretability, high accuracy, and fewer parameter size. We investigate the impact of the proposed hybrid model on monolingual Hindi ASR with semi-supervised training, and multilingual Hindi ASR with language adversarial transfer learning. In this work, we have chosen three Indian languages (Hindi, Marathi, Bengali) of the same Indo-Aryan family for multilingual training. All experiments were conducted using Kaldi and Py-Torch Kaldi toolkits. The proposed model with combined learning strategies helps to get the lowest 5.5% Word Error Rate (WER) for Hindi ASR.
What problem does this paper attempt to address?