Abstract:In ideal human computer interaction (HCI), the colloquial form of a language would be preferred by most users, since it is the form used in their day-to-day conversations. However, there is also an undeniable necessity to preserve the formal literary form. By embracing the new and preserving the old, both service to the common man (practicality) and service to the language itself (conservation) can be rendered. Hence, it is ideal for computers to have the ability to accept, process, and converse in both forms of the language, as required. To address this, it is first necessary to identify the form of the input speech, which in the current work is between literary and colloquial Tamil speech. Such a front-end system must consist of a simple, effective, and lightweight classifier that is trained on a few effective features that are capable of capturing the underlying patterns of the speech signal. To accomplish this, a one-dimensional convolutional neural network (1D-CNN) that learns the envelope of features across time, is proposed. The network is trained on a select number of handcrafted features initially, and then on Mel frequency cepstral coefficients (MFCC) for comparison. The handcrafted features were selected to address various aspects of speech such as the spectral and temporal characteristics, prosody, and voice quality. The features are initially analyzed by considering ten parallel utterances and observing the trend of each feature with respect to time. The proposed 1D-CNN, trained using the handcrafted features, offers an F1 score of 0.9803, while that trained on the MFCC offers an F1 score of 0.9895. In light of this, feature ablation and feature combination are explored. When the best ranked handcrafted features, from the feature ablation study, are combined with the MFCC, they offer the best results with an F1 score of 0.9946.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to distinguish between the literary form (Literary Tamil, LT) and the colloquial form (Colloquial Tamil, CT) of Tamil. These two forms have significant differences in vocabulary and acoustic characteristics, especially between the colloquial form used in daily conversations and the literary form in formal texts. To achieve this goal, the author proposes a feature - engineering method based on one - dimensional convolutional neural network (1D - CNN) for classification. ### Specific Problem Description 1. **Identification of Language Forms**: - In an ideal human - computer interaction (HCI) system, users are more inclined to use the colloquial form for communication. However, for a language as long - standing and rich as Tamil, protecting its literary form is also important. - In order to balance practicality and language protection, the computer needs to be able to process and identify these two forms of language input. Therefore, the primary task is to develop a front - end system that can accurately distinguish between Literary Tamil and Colloquial Tamil. 2. **Effective Feature Extraction**: - To build such a classifier, features that can effectively capture the underlying patterns of speech signals must be selected. These features should be able to reflect the differences between Literary Tamil and Colloquial Tamil. - The author proposes to use hand - crafted features and learn the time - series information of these features through one - dimensional convolutional neural network (1D - CNN). 3. **Performance Optimization**: - To improve the performance of the classifier, the author not only uses hand - crafted features but also uses Mel - Frequency Cepstral Coefficients (MFCC) for comparison, and finally explores the effect of combining the two features. - Through feature ablation study, the author quantifies and ranks the contribution of each hand - crafted feature, thereby further optimizing the performance of the classifier. ### Solutions - **Feature Selection**: - Hand - crafted features include: fundamental frequency (F0), energy, voicing probability, jitter, derivative of jitter, shimmer, harmonic - to - noise ratio (HNR), spectral flux, psychoacoustic sharpness, and zero - crossing rate. - **Model Architecture**: - Use one - dimensional convolutional neural network (1D - CNN) to learn the time - series information of these features. The advantage of 1D - CNN is that it can directly learn complex patterns from one - dimensional data, and has a low computational complexity, which is suitable for real - time applications. - **Experimental Results**: - The 1D - CNN using hand - crafted features achieved an F1 - score of 0.9803, and the 1D - CNN using MFCC features achieved an F1 - score of 0.9895. After combining the best hand - crafted features and MFCC features, the F1 - score was further improved to 0.9946. Through these methods, the paper successfully solves the classification problem between Literary Tamil and Colloquial Tamil and provides a valuable reference for future research.

A Feature Engineering Approach for Literary and Colloquial Tamil Speech Classification using 1D-CNN

Literary and Colloquial Dialect Identification for Tamil using Acoustic Features

Literary and Colloquial Tamil Dialect Identification

An Interpretable and Generalizable Speech Detector Based on a CNN-LSTM Framework

Automated Dysarthria Severity Classification: A Study on Acoustic Features and Deep Learning Techniques

Quartered Spectral Envelope and 1D-CNN-based Classification of Normally Phonated and Whispered Speech

Optimally configured convolutional neural network for Tamil Handwritten Character Recognition by improved lion optimization model

A focus module-based lightweight end-to-end CNN framework for voiceprint recognition

An Automatic Tamil Speech Recognition system by using Bidirectional Recurrent Neural Network with Self-Organizing Map

Convolutional neural network based language identification system: A spectrogram based approach

Classification of Bangla Compound Characters Using a HOG-CNN Hybrid Model

One-dimensional convolutional neural network and hybrid deep-learning paradigm for classification of specific language impaired children using their speech

Analysis of influencing features with spectral feature extraction and multi-class classification using deep neural network for speech recognition system

A novel nearest interest point classifier for offline Tamil handwritten character recognition

A stacked convolutional neural network framework with multi-scale attention mechanism for text-independent voiceprint recognition

An Attention Ensemble Approach for Efficient Text Classification of Indian Languages

Deep Learning Speech Synthesis Model for Word/Character-Level Recognition in the Tamil Language

A deep learning approach to dysarthric utterance classification with BiLSTM-GRU, speech cue filtering, and log mel spectrograms

Image Classification and Text Extraction using Machine Learning

Text-independent voiceprint recognition via compact embedding of dilated deep convolutional neural networks

Speech Recognition using Convolution Deep Neural Networks