Convolutional neural network based language identification system: A spectrogram based approach
Himani Tomar,Deepti Deshwal,Neelu Trivedi
DOI: https://doi.org/10.1007/s11042-024-20283-y
IF: 2.577
2024-10-05
Multimedia Tools and Applications
Abstract:Identifying the language spoken in an audio source is the difficult task of automatic language identification (LID) in speech processing. Short audio segments pose a significant challenge in language identification because they contain limited contextual information and fewer distinguishing features compared to longer audio samples. This lack of context makes it difficult to accurately identify the language, as the model has less data to analyse. By addressing the challenge of short-duration audio, the research aims to develop more robust and versatile language identification systems that can operate effectively even with minimal input. Another objective of the research is to address the specific challenge of identifying Indian languages accurately and efficiently from short-duration audio segments using CNNs and spectrogram representations in Python. The methodology involves several key steps: initially, audio data undergoes pre-processing to normalize the signals and reduce noise, ensuring consistency across the dataset. Subsequently, the audio signals are converted into spectrograms, which offer a visual depiction of the frequency spectrum, capturing both temporal and frequency characteristics essential for language discrimination. A CNN model is then built and trained using these spectrograms, with a specific architecture designed to extract significant features from the spectrograms. The system's performance is evaluated on a custom dataset consisting of three Indian languages: Hindi, Tamil, and Malayalam. The experimental findings show that a 98.9% accuracy rate is attained by the CNN-based model, surpassing the performance of existing models. The proposed method has potential applications in areas such as automatic speech recognition and speaker identification, where accurate and efficient language identification is crucial.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering