AGADIR: Towards Array-Geometry Agnostic Directional Speech Recognition

Ju Lin,Niko Moritz,Yiteng Huang,Ruiming Xie,Ming Sun,Christian Fuegen,Frank Seide
2024-01-19
Abstract:Wearable devices like smart glasses are approaching the compute capability to seamlessly generate real-time closed captions for live conversations. We build on our recently introduced directional Automatic Speech Recognition (ASR) for smart glasses that have microphone arrays, which fuses multi-channel ASR with serialized output training, for wearer/conversation-partner disambiguation as well as suppression of cross-talk speech from non-target directions and noise.
Audio and Speech Processing,Sound
What problem does this paper attempt to address?