Investigation of Spatial-Acoustic Features for Overlapping Speech Detection in Multiparty Meetings

Shiliang Zhang,Siqi Zheng,Weilong Huang,Ming Lei,Hongbin Suo,Jinwei Feng,Zhijie Yan
DOI: https://doi.org/10.21437/interspeech.2021-747
2021-01-01
Abstract:In this paper, we propose an overlapping speech detection (OSD) system for real multiparty meetings. Different from previous works on single-channel recordings or simulated data, we conduct research on real multi-channel data recorded by an 8-microphone array. We investigate how spatial information provided by multi-channel beamforming can benefit OSD. Specifically, we propose a two-stream DFSMN to jointly model acoustic and spatial features. Instead of performing frame-level OSD, we try to perform segment-level OSD. We come up with an attention pooling layer to model speech segments with variable length. Experimental results show that two-stream DFSMN with attention pooling can effectively model acoustic-spatial feature and significantly boost the performance of OSD, result in 3.5% (from 85.57% to 89.12%) absolute detection accuracy improvement compared to the baseline system.
What problem does this paper attempt to address?