Continuous speech separation: Dataset and analysis

Zhuo Chen, Takuya Yoshioka, Liang Lu, Tianyan Zhou, Zhong Meng, Yi Luo, Jian Wu, Xiong Xiao, Jinyu Li
2020-05-04
Abstract:This paper describes a dataset and protocols for evaluating continuous speech separation algorithms. Most prior speech separation studies use pre-segmented audio signals, which are typically generated by mixing speech utterances on computers so that they fully overlap. Also, the separation algorithms have often been evaluated based on signal-based metrics such as signal-to-distortion ratio. However, in natural conversations, speech signals are continuous and contain both overlapped and overlap-free regions. In addition, the signal-based metrics only have weak correlation with automatic speech recognition (ASR) accuracy. Not only does this make it hard to assess the practical relevance of the tested algorithms, it also hinders researchers from developing systems that can be readily applied to real scenarios. In this paper, we define continuous speech separation (CSS) as a task of generating a set of non …
What problem does this paper attempt to address?