Task-driven Common Subspace Learning Based Semantic Feature Extraction for Acoustic Event Recognition

Qiuying Shi,Shiwen Deng,Jiqing Han
DOI: https://doi.org/10.1016/j.eswa.2023.121045
IF: 8.5
2023-01-01
Expert Systems with Applications
Abstract:For acoustic event recognition (AER), it is important to extract the semantic feature that considers both the content information and the temporal ordering. To this end, our previous work proposed a common subspace learning (CSL) based method. However, the CSL treats the subspace learning and the back-end classifier training as two separate phases. In this manner, the discriminative information of the latter phase cannot be utilized to supervise the former learning. Therefore, the extracted feature based on the learned subspace also cannot contain the discriminative information. To solve this problem, we further propose a task-driven CSL (TD-CSL) based method to extract the semantic feature by jointly learning the above two phases. In the TD-CSL, the discriminative information, obtained during the classifier training phase, can be effectively adopted to supervise the learning of the common subspace. Specifically, the TD-CSL is formulated as a bi-level optimization problem, which regards the objective for the CSL in the lower level as a constraint of that for the classifier training in the upper. Furthermore, to obtain the optimal solutions of the subspace and the classifier, a gradient-based algorithm is designed. To evaluate the performance of the TD-CSL, experiments are conducted on the ESC-50 and ESC-10 databases. The TD-CSL can achieve 85.75% and 98.75% recognition accuracies on the two databases respectively, which outperforms the CSL and the related state-of-the-art methods.
What problem does this paper attempt to address?