Social Signal Detection by Probabilistic Sampling DNN Training
Gábor Gosztolya,Tamás Grósz,László Tóth,Gabor Gosztolya,Tamas Grosz,Laszlo Toth
DOI: https://doi.org/10.1109/taffc.2018.2871450
IF: 13.99
2020-01-01
IEEE Transactions on Affective Computing
Abstract:When our task is to detect social signals such as laughter and filler events in an audio recording, the most straightforward way is to apply a Hidden Markov Model–or a Hidden Markov Model/Deep Neural Network (HMM/DNN) hybrid, which is considered state-of-the-art nowadays. In this hybrid model, the DNN component is trained on frame-level samples of the classes we are looking for. In such event detection tasks, however, the training labels are seriously imbalanced, as typically only a small fraction of the training data corresponds to these social signals, while the bulk of the utterances consists of speech segments or silence. A strong imbalance of the training classes is known to cause difficulties during DNN training. To alleviate these problems, here we apply the technique called probabilistic sampling, which seeks to balance the class distribution. Probabilistic sampling is a mathematically well-founded combination of upsampling and downsampling, which was found to outperform both of these simple resampling approaches. With this strategy, we managed to achieve a 7–8 percent relative error reduction both at the segment level and frame level, and we efficiently reduced the DNN training times as well.
computer science, cybernetics, artificial intelligence