Non-Intrusive Speech Quality Assessment with Multi-Task Learning Based on Tensor Network.

Hanyue Liu,Miao Liu,Jing Wang,Xiang Xie,Lidong Yang
DOI: https://doi.org/10.1109/ICASSP48485.2024.10447695
2024-01-01
Abstract:With the growing significance of non-intrusive speech quality assessment in speech systems, existing methods predominantly rely on neural networks to extract low-order features. Typically, these features undergo a low-dimensional linear transformation, yielding the network’s output. However, the intercorrelation between feature points is often overlooked. In this paper, we explore the concept of kernel method, which maps features into high dimensional space through dot product, in order to enhance the extraction of relationships among all feature points. Considering the unique advantages of tensors in complex data representation, we extend the utilization of tensor network and propose a novel framework that incorporates a matrix product state (MPS) layer to predict mean opinion score (MOS). By integrating the MPS layer, our model can transform low-order features into higher-order representations, facilitating linear transformation in a high dimensional space without increasing the number of parameters. Furthermore, we propose a loss function that concurrently assesses regression and classification biases, along with correlation with real MOS labels. Experimental results demonstrate that our proposed model consistently outperforms the baseline system across all evaluation metrics and surpasses state-of-the-art models on the test set.
What problem does this paper attempt to address?