Speech Topic Classification Based on Pre-trained and Graph Networks.

Fangjing Niu,Tengfei Cao,Ying Hu,Hao Huang,Liang He
DOI: https://doi.org/10.1109/ICME55011.2023.00296
2023-01-01
Abstract:Speech Topic Classification (STC) automatically classifies audio clips into predefined categories, which is widely used in short video, personalized recommendation and other fields. At present, the common system is composed of two parts: first, the speech is converted into text by automatic speech recognition (ASR), and then the text topic is classified by natural language processing (NLP). Most of them have problems such as error propagation and lack of global structure. So in this paper, we propose a new end-to-end framework based on a pre-trained model and graph network. The pre-trained model is used to extract the semantic features with sequential structure instead of acoustic features, and the combination with the global features of conversational context constructed by graph network has achieved good results on the Fisher dataset.
What problem does this paper attempt to address?