A Pipelined Framework with Serialized Output Training for Overlapping Speech Recognition

Tao Li,Lingyan Huang,Feng Wang,Song Li,Qingyang Hong,Lin Li
DOI: https://doi.org/10.1007/978-981-99-2401-1_10
2023-01-01
Abstract:Far-field, noise, reverberation, and overlapping speech make the cocktail party problem one of the greatest challenges in speech recognition. In this paper, we focus on solving the problem of overlapping speech and present a pipelined architecture with serialized output training(SOT). The baseline and the proposed methods are evaluated on the artificially mixed speech datasets generated from the AliMeeting corpus. Experimental results demonstrate that our proposed model outperforms the baseline even with high overlap ratio, which leads to 10.8% and 4.9% relative performance gains in terms of CER for 0.5 overlap ratio and average case, respectively.
What problem does this paper attempt to address?