Bag-Of-Words Representation for Non-Intrusive Speech Quality Assessment

Qiaohong Li,Weisi Lin,Yuming Fang,Daniel Thalmann
DOI: https://doi.org/10.1109/chinasip.2015.7230477
2015-01-01
Abstract:Research on non-intrusive speech quality assessment (SQA) aims to develop a computational model simulating the human perception of speech signals accurately and automatically without any prior information about the reference clean speech signals. In this paper, we propose to learn a non-intrusive SQA metric based on bag-of-words (BoW) representation of speech signals. In particular, the proposed method treats the whole speech utterance as a text document and extracts perceptual linear prediction (PLP) features of local segments as words. The speech utterance is then represented as a histogram of codewords, with each entry as the probability of a codeword appeared in the utterance. After the BoW representation of speech signals is obtained, support vector regression (SVR) is used to learn the metric for quality evaluation. Experimental results demonstrate that the proposed non-intrusive SQA metric BoW can obtain better performance than relevant state-of-the-art metrics.
What problem does this paper attempt to address?