NResNet: nested residual network based on channel and frequency domain attention mechanism for speaker verification in classroom
Qiuyu Zheng,Zengzhao Chen,Xinxing Jiang,Mengting Lin,Mengke Wang,Yuanyuan Lu
DOI: https://doi.org/10.1007/s11042-024-19588-9
IF: 2.577
2024-06-15
Multimedia Tools and Applications
Abstract:With the development of deep learning technology, the pattern of artificial intelligence in education has attracted more and more attention. However, most of the existing verbal interaction analysis methods utilized in the classroom are still in the semi-artificial stage, which lacks intelligence and normality. Therefore, we propose a nested residual network with multi-scale aggregation and speaker attention mechanism, which can distinguish the speech of teachers and students by identifying audio clips in the classroom. Thus, the teaching mode can be analyzed by the verbal interaction between teachers and students. However, the existing method of speaker verification cannot be adapted to the classroom scene, one reason is that the language environment is inconsistent, and the other is the difference in speaker distribution. Therefore, a deep multi-scale aggregation residual network model was proposed, which can ensure the validity of voiceprint information to the greatest extent. A speaker attention mechanism that includes channel-domain and frequency-domain information were introduced to obtain the differences in pronunciation habits and voiceprint amplitude of teachers and students. Experimental results demonstrate that the proposed method achieves outstanding performance with significant learning-capacity, outperforming the state-of-the-art methods. The proposed method obtained a 6.20% accuracy improvement over the compared methods with a 4.00% equal error rate improvement on the English public dataset LibriSpeech. In order to adapt to Chinese classroom, we also proved that the proposed method has good cross-language adaptability through training performance on the Chinese dataset AISHELL. The Experimental results in Chinese classroom shown that the proposed method got a highest improvement 22.70% than other. Our project will be publicly available at http://ecourse.nercel.com.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering