VHF Speech Recognition Model Based on Improved Conformer Structure

Han Meng,Mingyang Pan,Zongying Liu,Jingfeng Hu,Ruolan Zhang,Yu Li
DOI: https://doi.org/10.1109/aicit62434.2024.10730458
2024-01-01
Abstract:This paper presents an improved VHF Speech Recognition Model (VHFSR) based on the modified Conformer architecture, aiming to address the issue of poor communication quality in very high- frequency (VHF) voice communication within the navigation domain. Speech recognition is a technology that transforms acoustic features into text sequences, and VHFSR demonstrates accurate identification of maritime VHF voice communication content. We selected the WeNet speech recognition model as the baseline model, and through a series of systematic studies, identified that the Conformer architecture's design choices were suboptimal. Consequently, we modified the Conformer encoder in WeNet to propose the VHFSR model, consistently outperforming the WeNet model under the same training scheme. Specifically, VHFSR introduces improvements to the multi-head attention module in Conformer by incorporating an additional multi-head attention module, transforming the Conformer architecture into an FMCMF form rather than the Macaron structure proposed in Conformer. This model effectively addresses VHF speech recognition challenges in the navigation domain and achieves promising results by combining a custom VHF dataset with the AISHELL dataset when using MFCC as the acoustic feature. The final achieved Character Error Rate (CER) for speech stream recognition is 2.53%.
What problem does this paper attempt to address?