High-precision medical speech recognition through synthetic data and semantic correction: UNITED-MEDASR

Sourav Banerjee,Ayushi Agarwal,Promila Ghosh
2024-11-25
Abstract:Automatic Speech Recognition (ASR) systems in the clinical domain face significant challenges, notably the need to recognise specialised medical vocabulary accurately and meet stringent precision requirements. We introduce United-MedASR, a novel architecture that addresses these challenges by integrating synthetic data generation, precision ASR fine-tuning, and advanced semantic enhancement techniques. United-MedASR constructs a specialised medical vocabulary by synthesising data from authoritative sources such as ICD-10 (International Classification of Diseases, 10th Revision), MIMS (Monthly Index of Medical Specialties), and FDA databases. This enriched vocabulary helps finetune the Whisper ASR model to better cater to clinical needs. To enhance processing speed, we incorporate Faster Whisper, ensuring streamlined and high-speed ASR performance. Additionally, we employ a customised BART-based semantic enhancer to handle intricate medical terminology, thereby increasing accuracy efficiently. Our layered approach establishes new benchmarks in ASR performance, achieving a Word Error Rate (WER) of 0.985% on LibriSpeech test-clean, 0.26% on Europarl-ASR EN Guest-test, and demonstrating robust performance on Tedlium (0.29% WER) and FLEURS (0.336% WER). Furthermore, we present an adaptable architecture that can be replicated across different domains, making it a versatile solution for domain-specific ASR systems.
Audio and Speech Processing,Computation and Language,Sound
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are the challenges faced by Automatic Speech Recognition (ASR) systems in the clinical field, especially accurately recognizing professional medical vocabulary and meeting strict precision requirements. Specifically, the paper proposes solutions to the following key issues: 1. **Recognition of professional medical vocabulary**: - General ASR systems perform poorly when dealing with terms in specific fields, especially in the medical environment, where drug names, anatomical terms, and clinical procedures require highly specialized understanding. This may lead to critical errors and affect medical safety. 2. **Requirement for high precision**: - ASR systems in the medical field not only need high accuracy but also need to be able to handle complex medical language patterns and context information. Although existing ASR models perform well in daily scenarios, they are often powerless when faced with complex medical terms. 3. **Data scarcity and privacy protection**: - Obtaining high - quality medical voice data is very difficult because it involves high costs and time investment, as well as strict privacy regulations. Therefore, how to effectively generate and utilize synthetic data has become an important topic. To solve these problems, the paper proposes a new architecture named United - MedASR, which improves the precision and efficiency of medical speech recognition through the following methods: - **Synthetic data generation**: Utilize data from authoritative medical databases (such as ICD - 10, MIMS, and FDA) to generate high - quality synthetic voice data. - **Precise ASR fine - tuning**: Fine - tune based on the Whisper model to better adapt to the special needs of the medical field. - **Advanced semantic enhancement technology**: Introduce a semantic enhancement module based on BART to specifically handle complex medical terms and improve the accuracy of transcription. These improvements enable United - MedASR to achieve significant performance improvements in multiple benchmark tests. For example, the Word Error Rate (WER) on the LibriSpeech test set reaches 0.985%, and the WER on the Europarl - ASR EN Guest - test is 0.26%. In addition, this system also demonstrates scalability and flexibility in different fields, providing a solid foundation for building domain - specific ASR systems.