The OCON model: an old but green solution for distributable supervised classification for acoustic monitoring in smart cities

Stefano Giacomelli,Marco Giordano,Claudia Rinaldi
2024-10-05
Abstract:This paper explores a structured application of the One-Class approach and the One-Class-One-Network model for supervised classification tasks, focusing on vowel phonemes classification and speakers recognition for the Automatic Speech Recognition (ASR) domain. For our case-study, the ASR model runs on a proprietary sensing and lightning system, exploited to monitor acoustic and air pollution on urban streets. We formalize combinations of pseudo-Neural Architecture Search and Hyper-Parameters Tuning experiments, using an informed grid-search methodology, to achieve classification accuracy comparable to nowadays most complex architectures, delving into the speaker recognition and energy efficiency aspects. Despite its simplicity, our model proposal has a very good chance to generalize the language and speaker genders context for widespread applicability in computational constrained contexts, proved by relevant statistical and performance metrics. Our experiments code is openly accessible on our GitHub.
Sound,Artificial Intelligence,Audio and Speech Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to achieve efficient and energy - saving speech recognition tasks during acoustic monitoring in the smart city environment. Specifically, the paper explores a model named "One - Class - One - Network" (OCON) for supervised classification tasks, especially vowel phoneme classification and speaker recognition. The main objectives of the paper include: 1. **Improve classification accuracy**: By using pseudo - Neural Architecture Search (pseudo - NAS) and Hyper - Parameters Tuning (HPs - T), make the OCON model comparable to the most complex architectures currently in terms of classification accuracy. 2. **Reduce computational complexity**: Design a shallow and optimizable sub - architecture that can operate in a computationally resource - constrained environment while maintaining high classification performance. 3. **Achieve gender recognition**: Explore the addition of speaker gender recognition functionality in speech recognition to enhance context understanding, personalized responses, and behavior analysis. 4. **Energy efficiency**: Experimentally evaluate the energy consumption and carbon emissions of the model to ensure the green sustainability of the model in practical applications. The paper verifies the effectiveness of the OCON model through a series of experiments and provides detailed experimental results and performance indicators. These studies are of great significance for acoustic monitoring and environmental monitoring in smart cities.