Abstract:Police departments around the world use two-way radio for coordination. These broadcast police communications (BPC) are a unique source of information about everyday police activity and emergency response. Yet BPC are not transcribed, and their naturalistic audio properties make automatic transcription challenging. We collect a corpus of roughly 62,000 manually transcribed radio transmissions (~46 hours of audio) to evaluate the feasibility of automatic speech recognition (ASR) using modern recognition models. We evaluate the performance of off-the-shelf speech recognizers, models fine-tuned on BPC data, and customized end-to-end models. We find that both human and machine transcription is challenging in this domain. Large off-the-shelf ASR models perform poorly, but fine-tuned models can reach the approximate range of human performance. Our work suggests directions for future work, including analysis of short utterances and potential miscommunication in police radio interactions. We make our corpus and data annotation pipeline available to other researchers, to enable further research on recognition and analysis of police communication.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to transcribe and analyze Broadcast Police Communications (BPC) through Automatic Speech Recognition (ASR) technology. Specifically, the paper focuses on the following aspects: 1. **Processing of challenging audio data**: - Police radio communications have naturalistic audio properties, including background noise, highly specialized terms, and short phrases, which make automatic transcription very difficult. - The paper collected a corpus containing approximately 62,000 manually transcribed audios (about 46 hours) to evaluate the feasibility of modern ASR models in this field. 2. **Performance evaluation of existing models**: - Researchers evaluated the performance of off - the - shelf ASR models, models fine - tuned on BPC data, and custom end - to - end models. - The results show that existing large - scale ASR models perform poorly in this specific domain, but the fine - tuned models can achieve near - human - level performance. 3. **Provision of research resources**: - To support future research, the researchers made their corpus and data annotation pipeline public, enabling other researchers to further study and improve the recognition and analysis of police communications. ### Specific problems and solutions - **Problem 1: How do existing ASR models perform on BPC?** - **Solution**: Through experiments, it was found that the unadjusted large - scale ASR models (such as Whisper large - v2 and v3) have a high Word Error Rate (WER) on BPC, reaching 57.4% and 51.4% respectively. In contrast, the fine - tuned NeMo Conformer CTC model reduces the WER to 27.7%, close to the human level. - **Problem 2: How to improve the performance of ASR models on BPC?** - **Solution**: By fine - tuning the pre - trained NeMo model, the researchers significantly improved the performance of the model. In addition, they also explored different feature extractors (such as log Mel - filterbank features, HuBERT Large, WavLM Large, etc.) and combined self - supervised learning methods to further improve the model effect. - **Problem 3: What is the impact of audio quality and sentence length on ASR performance?** - **Solution**: Research shows that there is a weak correlation between audio quality and sentence length and WER. The performance of the fine - tuned model has improved in these two factors, indicating that fine - tuning can reduce the impact of noise on the model performance. ### Conclusion This research provides a preliminary basis for the automated analysis of police radio communications, shows the baseline performance of current ASR models in this field, and points out the direction for future research. By opening the corpus and data annotation pipeline, the researchers hope to promote more work on police communication analysis, so as to better understand police behavior and decision - making processes. --- If you need a more detailed explanation or have other questions, please feel free to let us know!

Speech Recognition for Analysis of Police Radio Communication

Developing Speech Processing Pipelines for Police Accountability

Datastore Design for Analysis of Police Broadcast Audio at Scale

Race and Privacy in Broadcast Police Communications

Speech Personality Recognition Based on Annotation Classification Using Log-Likelihood Distance and Extraction of Essential Audio Features.

Improving Speech Recognition for African American English With Audio Classification

Transcription-Free Fine-Tuning of Speech Separation Models for Noisy and Reverberant Multi-Speaker Automatic Speech Recognition

RescueSpeech: A German Corpus for Speech Recognition in Search and Rescue Domain

Multimodal Speech Recognition Using EEG and Audio Signals: A Novel Approach for Enhancing ASR Systems

Speech recognition for medical conversations

ATCSpeech: A Multilingual Pilot-Controller Speech Corpus from Real Air Traffic Control Environment

Bringing NURC/SP to Digital Life: the Role of Open-source Automatic Speech Recognition Models

Detecting Institutional Dialog Acts in Police Traffic Stops.

Towards Measuring Fairness in Speech Recognition: Casual Conversations Dataset Transcriptions

Call-sign recognition and understanding for noisy air-traffic transcripts using surveillance information

Lessons Learned in ATCO2: 5000 hours of Air Traffic Control Communications for Robust Automatic Speech Recognition and Understanding

Automatic Speech Recognition Post-Processing for Readability: Task, Dataset and a Two-Stage Pre-Trained Approach

Towards Recognition for Radio-Echo Speech in Air Traffic Control: Dataset and a Contrastive Learning Approach.

Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels

mmWave-Whisper: Phone Call Eavesdropping and Transcription Using Millimeter-Wave Radar

ATCSpeechNet: A multilingual end-to-end speech recognition framework for air traffic control systems