Design of the Oral English Teaching Method Based on Multimodal Feature Fusion

Xiaolei He
DOI: https://doi.org/10.1155/2022/6224608
2022-08-08
Mobile Information Systems
Abstract:In order to solve the problems of too complex speech extraction algorithm and insufficient representation ability in oral English teaching, this paper proposes a speech scoring mechanism based on multimodal fusion. Firstly, feature extraction of multimodal audio and video is carried out, and a multimodal speech error detection model of LSTM-CTC is proposed; Then, the distance of MCFF, volume intensity, and pitch track are calculated by the DTW algorithm, and the speech scoring model is established. The experimental results show that under the condition of no noise and strong noise, multimodal speech detection can achieve a better error detection effect, and its system score is close to the actual situation, which can provide new ideas for oral English teaching methods.
computer science, information systems,telecommunications
What problem does this paper attempt to address?