Abstract:Machine Translation System (MTS) serves as an effective tool for communication by translating text or speech from one language to another language. The need of an efficient translation system becomes obvious in a large multilingual environment like India, where English and a set of Indian Languages (ILs) are officially used. In contrast with English, ILs are still entreated as low-resource languages due to unavailability of corpora. In order to address such asymmetric nature, multilingual neural machine translation (MNMT) system evolves as an ideal approach in this direction. In this paper, we propose a MNMT system to address the issues related to low-resource language translation. Our model comprises of two MNMT systems i.e. for English-Indic (one-to-many) and the other for Indic-English (many-to-one) with a shared encoder-decoder containing 15 language pairs (30 translation directions). Since most of IL pairs have scanty amount of parallel corpora, not sufficient for training any machine translation model. We explore various augmentation strategies to improve overall translation quality through the proposed model. A state-of-the-art transformer architecture is used to realize the proposed model. Trials over a good amount of data reveal its superiority over the conventional models. In addition, the paper addresses the use of language relationships (in terms of dialect, script, etc.), particularly about the role of high-resource languages of the same family in boosting the performance of low-resource languages. Moreover, the experimental results also show the advantage of backtranslation and domain adaptation for ILs to enhance the translation quality of both source and target languages. Using all these key approaches, our proposed model emerges to be more efficient than the baseline model in terms of evaluation metrics i.e BLEU (BiLingual Evaluation Understudy) score for a set of ILs.

An Integrated Model for Text to Text, Image to Text and Audio to Text Linguistic Conversion using Machine Learning Approach

Language-agnostic Multilingual Modeling

Automated Sign to Speech Conversion Model using Deep Learning

Multilingual Speech to Text using Deep Learning based on MFCC Features

Language Modeling for Code-Switched Data: Challenges and Approaches

An exploration of semi-supervised and language-adversarial transfer learning using hybrid acoustic model for hindi speech recognition

Language Model Bootstrapping Using Neural Machine Translation For Conversational Speech Recognition

Convolutional neural network based language identification system: A spectrogram based approach

Improving neural machine translation for low-resource Indian languages using rule-based feature extraction

Image Recognition Using Text and Audio Translation for the Visually Challenged

Multilingual Speech Recognition Methods using Deep Learning and Cosine Similarity

A Unified Model For Voice and Accent Conversion In Speech and Singing using Self-Supervised Learning and Feature Extraction

Machine Translation with Large Language Models: Decoder Only vs. Encoder-Decoder

Towards Building ASR Systems for the Next Billion Users

Translation, Sentiment and Voices: A Computational Model to Translate and Analyse Voices from Real-Time Video Calling

BhashaVerse : Translation Ecosystem for Indian Subcontinent Languages

Consensus-Based Machine Translation for Code-Mixed Texts

Assessing Translation capabilities of Large Language Models involving English and Indian Languages

Improving Multilingual Neural Machine Translation System for Indic Languages

MATra: A Multilingual Attentive Transliteration System for Indian Scripts

Exploring Text-to-Text Transformers for English to Hinglish Machine Translation with Synthetic Code-Mixing