Speech Recognition for Analysis of Police Radio Communication

Tejes Srivastava,Ju-Chieh Chou,Priyank Shroff,Karen Livescu,Christopher Graziul
2024-09-17
Abstract:Police departments around the world use two-way radio for coordination. These broadcast police communications (BPC) are a unique source of information about everyday police activity and emergency response. Yet BPC are not transcribed, and their naturalistic audio properties make automatic transcription challenging. We collect a corpus of roughly 62,000 manually transcribed radio transmissions (~46 hours of audio) to evaluate the feasibility of automatic speech recognition (ASR) using modern recognition models. We evaluate the performance of off-the-shelf speech recognizers, models fine-tuned on BPC data, and customized end-to-end models. We find that both human and machine transcription is challenging in this domain. Large off-the-shelf ASR models perform poorly, but fine-tuned models can reach the approximate range of human performance. Our work suggests directions for future work, including analysis of short utterances and potential miscommunication in police radio interactions. We make our corpus and data annotation pipeline available to other researchers, to enable further research on recognition and analysis of police communication.
Sound,Audio and Speech Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to transcribe and analyze Broadcast Police Communications (BPC) through Automatic Speech Recognition (ASR) technology. Specifically, the paper focuses on the following aspects: 1. **Processing of challenging audio data**: - Police radio communications have naturalistic audio properties, including background noise, highly specialized terms, and short phrases, which make automatic transcription very difficult. - The paper collected a corpus containing approximately 62,000 manually transcribed audios (about 46 hours) to evaluate the feasibility of modern ASR models in this field. 2. **Performance evaluation of existing models**: - Researchers evaluated the performance of off - the - shelf ASR models, models fine - tuned on BPC data, and custom end - to - end models. - The results show that existing large - scale ASR models perform poorly in this specific domain, but the fine - tuned models can achieve near - human - level performance. 3. **Provision of research resources**: - To support future research, the researchers made their corpus and data annotation pipeline public, enabling other researchers to further study and improve the recognition and analysis of police communications. ### Specific problems and solutions - **Problem 1: How do existing ASR models perform on BPC?** - **Solution**: Through experiments, it was found that the unadjusted large - scale ASR models (such as Whisper large - v2 and v3) have a high Word Error Rate (WER) on BPC, reaching 57.4% and 51.4% respectively. In contrast, the fine - tuned NeMo Conformer CTC model reduces the WER to 27.7%, close to the human level. - **Problem 2: How to improve the performance of ASR models on BPC?** - **Solution**: By fine - tuning the pre - trained NeMo model, the researchers significantly improved the performance of the model. In addition, they also explored different feature extractors (such as log Mel - filterbank features, HuBERT Large, WavLM Large, etc.) and combined self - supervised learning methods to further improve the model effect. - **Problem 3: What is the impact of audio quality and sentence length on ASR performance?** - **Solution**: Research shows that there is a weak correlation between audio quality and sentence length and WER. The performance of the fine - tuned model has improved in these two factors, indicating that fine - tuning can reduce the impact of noise on the model performance. ### Conclusion This research provides a preliminary basis for the automated analysis of police radio communications, shows the baseline performance of current ASR models in this field, and points out the direction for future research. By opening the corpus and data annotation pipeline, the researchers hope to promote more work on police communication analysis, so as to better understand police behavior and decision - making processes. --- If you need a more detailed explanation or have other questions, please feel free to let us know!