Emotions Beyond Words: Non-Speech Audio Emotion Recognition With Edge Computing

Ibrahim Malik,Siddique Latif,Sanaullah Manzoor,Muhammad Usama,Junaid Qadir,Raja Jurdak
2023-05-01
Abstract:Non-speech emotion recognition has a wide range of applications including healthcare, crime control and rescue, and entertainment, to name a few. Providing these applications using edge computing has great potential, however, recent studies are focused on speech-emotion recognition using complex architectures. In this paper, a non-speech-based emotion recognition system is proposed, which can rely on edge computing to analyse emotions conveyed through non-speech expressions like screaming and crying. In particular, we explore knowledge distillation to design a computationally efficient system that can be deployed on edge devices with limited resources without degrading the performance significantly. We comprehensively evaluate our proposed framework using two publicly available datasets and highlight its effectiveness by comparing the results with the well-known MobileNet model. Our results demonstrate the feasibility and effectiveness of using edge computing for non-speech emotion detection, which can potentially improve applications that rely on emotion detection in communication networks. To the best of our knowledge, this is the first work on an edge-computing-based framework for detecting emotions in non-speech audio, offering promising directions for future research.
Sound,Audio and Speech Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to use edge computing to detect non - vocal emotions in communication networks, especially emotions in non - verbal expressions such as screaming and crying. Currently, most emotion recognition research focuses on speech - based emotion recognition, using complex deep - learning architectures. However, non - vocal emotion recognition is equally important, especially in fields such as public safety, healthcare, rescue services, and entertainment. These applications require real - time processing and low - latency responses, and traditional cloud - based processing methods may not be able to meet these requirements. In addition, transmitting raw voice data to remote servers for processing also involves privacy issues. To address these problems, this paper proposes a non - vocal emotion recognition system based on edge computing. The system uses knowledge distillation technology to design a computationally efficient model that can be deployed on resource - constrained edge devices without significantly degrading performance. The paper evaluates the proposed framework through experiments on two public datasets and compares it with the well - known MobileNet model, demonstrating its effectiveness and feasibility. Specifically, the main contributions of the paper include: 1. Using edge computing to design a low - latency non - vocal emotion recognition system suitable for resource - constrained devices. 2. Developing a computationally efficient non - vocal emotion detection system through the use of knowledge distillation. 3. Discussing the potential of using non - vocal emotions in various applications, such as healthcare, rescue services, etc. 4. Proving the effectiveness of the proposed framework through experimental evaluations on two public datasets. The results show that the proposed model outperforms the MobileNetV3 - small model in performance and provides better computational efficiency. Through these contributions, this paper not only addresses the limitations of existing emotion recognition systems but also provides a new direction for future non - vocal emotion recognition research.