Predictive Skim: Contrastive Predictive Coding for Low-Latency Online Speech Separation

Chenda Li,Yifei Wu,Yanmin Qian
DOI: https://doi.org/10.1109/icassp49357.2023.10097107
2023-01-01
Abstract:In online speech separation, there is a trade-off between inherent latency and speech separation performance. When processing the current input audio, looking ahead to more future context usually brings better speech separation performance but increases the algorithm latency, and vice versa. In the requirements of extremely low latency, the future context is expensive for the algorithm latency and may not be available. In this work, we apply the contrastive predictive coding (CPC) method to the previously proposed online Skipping Memory (SkiM) speech separation model, which is a low-latency model for online speech separation. During the training stage, the SkiM model is required to predict the future memory states given the history memory. By using CPC training, the predictive SkiM model shows stronger causal sequence modeling capacity in the online speech separation task. In addition, we explore a local context codec (LCC) method to reduce the computational cost, and we make qualitative analyses on it. Our best online predictive SkiM equipped with CPC and LCC gets 15.5 dB SI-SNR improvement on WSJ02-mix benchmark with 3-ms actual latency tested on a single-core CPU, which should be the state-of-the-art results among causal models.
What problem does this paper attempt to address?