Luganda Speech Intent Recognition for IoT Applications

Andrew Katumba,Sudi Murindanyi,John Trevor Kasule,Elvis Mugume
2024-05-16
Abstract:The advent of Internet of Things (IoT) technology has generated massive interest in voice-controlled smart homes. While many voice-controlled smart home systems are designed to understand and support widely spoken languages like English, speakers of low-resource languages like Luganda may need more support. This research project aimed to develop a Luganda speech intent classification system for IoT applications to integrate local languages into smart home environments. The project uses hardware components such as Raspberry Pi, Wio Terminal, and ESP32 nodes as microcontrollers. The Raspberry Pi processes Luganda voice commands, the Wio Terminal is a display device, and the ESP32 nodes control the IoT devices. The ultimate objective of this work was to enable voice control using Luganda, which was accomplished through a natural language processing (NLP) model deployed on the Raspberry Pi. The NLP model utilized Mel Frequency Cepstral Coefficients (MFCCs) as acoustic features and a Convolutional Neural Network (Conv2D) architecture for speech intent classification. A dataset of Luganda voice commands was curated for this purpose and this has been made open-source. This work addresses the localization challenges and linguistic diversity in IoT applications by incorporating Luganda voice commands, enabling users to interact with smart home devices without English proficiency, especially in regions where local languages are predominant.
Sound,Artificial Intelligence,Computation and Language,Audio and Speech Processing
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the insufficient support for low - resource languages (such as Luganda) in existing voice - controlled smart home systems. Specifically: 1. **Language support issues**: Most existing voice control systems mainly support widely - used large languages (such as English), while low - resource languages like Luganda lack sufficient support, causing users of these languages to face obstacles in smart device interactions. 2. **Localization challenges**: In countries such as Uganda, English is not a widely - used language, and existing voice control systems cannot meet the needs and preferences of the local population. Therefore, it is necessary to develop a system that can understand and process Luganda voice commands to improve user experience and system accessibility. To solve these problems, this research project aims to develop a Luganda voice - intent classification system for Internet of Things (IoT) applications. This goal is achieved in the following ways: - **Hardware integration**: Use Raspberry Pi, Wio Terminal and ESP32 nodes as microcontrollers, where Raspberry Pi is responsible for processing Luganda voice commands, Wio Terminal serves as a display device, and ESP32 nodes control IoT devices. - **Natural language processing (NLP) model**: Deploy an NLP model based on convolutional neural network (CNN), using Mel - frequency cepstral coefficients (MFCCs) as acoustic features for voice - intent classification. - **Dataset creation**: Collect and organize a dataset of Luganda voice commands containing 20 different intents and make it open - source for subsequent research and verification. - **Model optimization and deployment**: Optimize the model size and performance through quantization techniques to ensure that it can run efficiently on resource - constrained edge devices while maintaining high accuracy and reliability. Finally, this research not only solves the recognition problem of Luganda voice commands in the smart home environment but also provides references for voice recognition in other low - resource languages.