Abstract:Identifying the language spoken in an audio source is the difficult task of automatic language identification (LID) in speech processing. Short audio segments pose a significant challenge in language identification because they contain limited contextual information and fewer distinguishing features compared to longer audio samples. This lack of context makes it difficult to accurately identify the language, as the model has less data to analyse. By addressing the challenge of short-duration audio, the research aims to develop more robust and versatile language identification systems that can operate effectively even with minimal input. Another objective of the research is to address the specific challenge of identifying Indian languages accurately and efficiently from short-duration audio segments using CNNs and spectrogram representations in Python. The methodology involves several key steps: initially, audio data undergoes pre-processing to normalize the signals and reduce noise, ensuring consistency across the dataset. Subsequently, the audio signals are converted into spectrograms, which offer a visual depiction of the frequency spectrum, capturing both temporal and frequency characteristics essential for language discrimination. A CNN model is then built and trained using these spectrograms, with a specific architecture designed to extract significant features from the spectrograms. The system's performance is evaluated on a custom dataset consisting of three Indian languages: Hindi, Tamil, and Malayalam. The experimental findings show that a 98.9% accuracy rate is attained by the CNN-based model, surpassing the performance of existing models. The proposed method has potential applications in areas such as automatic speech recognition and speaker identification, where accurate and efficient language identification is crucial.

Development of Indian Spoken Language Identification System for Two Languages using MFCC Feature with Deep Neural Network

Convolutional neural network based language identification system: A spectrogram based approach

Exploiting Spectral Augmentation for Code-Switched Spoken Language Identification

Multilingual Speech to Text using Deep Learning based on MFCC Features

Spoken Language Identification Using Hybrid Feature Extraction Methods

Leveraging Native Language Speech for Accent Identification using Deep Siamese Networks

Dialect Identification in Telugu Language Speech Utterance Using Modified Features with Deep Neural Network

Optimal prosodic feature extraction and classification in parametric excitation source information for Indian language identification using neural network based Q-learning algorithm

A review into deep learning techniques for spoken language identification

A language model based approach towards large scale and lightweight language identification systems

Analysis of influencing features with spectral feature extraction and multi-class classification using deep neural network for speech recognition system

Deep Learning for Speaker Identification: Architectural Insights from AB-1 Corpus Analysis and Performance Evaluation

Speaker identification and localization using shuffled MFCC features and deep learning

Artificial Neural Networks to Recognize Speakers Division from Continuous Bengali Speech

Deep Speech Based End-to-End Automated Speech Recognition (ASR) for Indian-English Accents

Identification of Speaker from Disguised Voice Using MFCC Feature Extraction, Chi-Square and Classification Technique

Acoustics Based Intent Recognition Using Discovered Phonetic Units for Low Resource Languages

Identification and Recognition of Speaker Voice Using a Neural Network-Based Algorithm

ELM speaker identification for limited dataset using multitaper based MFCC and PNCC features with fusion score

Speaker Identification Using MFCC Feature Extraction ANN Classification Technique

A hybrid discriminant fuzzy DNN with enhanced modularity bat algorithm for speech recognition