Implementation of a Whisper Architecture-Based Turkish Automatic Speech Recognition (ASR) System and Evaluation of the Effect of Fine-Tuning with a Low-Rank Adaptation (LoRA) Adapter on Its Performance
Hüseyin Polat,Alp Kaan Turan,Cemal Koçak,Hasan Basri Ulaş
DOI: https://doi.org/10.3390/electronics13214227
IF: 2.9
2024-10-29
Electronics
Abstract:This paper focuses on the implementation of the Whisper architecture to create an automatic speech recognition (ASR) system optimized for the Turkish language, which is considered a low-resource language in terms of speech recognition technologies. Whisper is a transformer-based model known for its high performance across numerous languages. However, its performance in Turkish, a language with unique linguistic features and limited labeled data, has yet to be fully explored. To address this, we conducted a series of experiments using five different Turkish speech datasets to assess the model's baseline performance. Initial evaluations revealed a range of word error rates (WERs) between 4.3% and 14.2%, reflecting the challenges posed by Turkish. To improve these results, we applied the low-rank adaptation (LoRA) technique, which is designed to fine-tune large-scale models efficiently by introducing a reduced set of trainable parameters. After fine-tuning, significant performance improvements were observed, with WER reductions of up to 52.38%. This study demonstrates that fine-tuned Whisper models can be successfully adapted for Turkish, resulting in a robust and accurate end-to-end ASR system. This research highlights the applicability of Whisper in low-resource languages and provides insights into the challenges of and strategies for improving speech recognition performance in Turkish.
engineering, electrical & electronic,computer science, information systems,physics, applied