Abstract:Due to the serious problem of population aging, monitoring of domestic activities is increasingly important. Audio tagging of domestic activities is very suitable when the visual data are unavailable due to the interference from light and the environment. Aiming at solving this problem, a neural network model based on the tensor network is proposed for audio tagging of domestic activities that is more interpretable than traditional neural networks. The introduction of the tensor network can compress the network parameters and reduce the redundancy of the training model while maintaining a good performance. First, the important features of a Mel spectrogram of the input audio are extracted through the convolutional neural networks (CNNs). Then, they are converted into the high-order space corresponding with the tensor network. The spatial structure information and important features can be further extracted and retained through the matrix product state (MPS). Large patches of the featured data are divided into small local orderless patches when using the tensor network. The final tagging results are obtained through the MPS layers which is just a tensor network structure based on the tensor train decomposition. In order to evaluate the proposed method, the DCASE 2018 challenge task 5 dataset for monitoring domestic activities is selected. The results showed that the average F1-score of the proposed model in the test set of the development dataset and validation dataset reached 87.7 and 85.9%, which are 3.2 and 2.8% higher than the baseline system, respectively. It is verified that the proposed model can perform better and more efficiently for audio tagging of domestic activities.

Audio Segment Classification Using Online Learning Based Tensor Representation Feature Discrimination

Online Learning for Classification of Low-rank Representation Features and Its Applications in Audio Segment Classification

Audio Classification with Low-Rank Matrix Representation Features

Pyramidal Temporal Pooling with Discriminative Mapping for Audio Classification

Soft Margin Based Low-Rank Audio Signal Classification

Using Deep Belief Network to Capture Temporal Information for Audio Event Classification.

Robust Audio Sensing with Multi-Sound Classification.

Audio Segmentation Based On Multi-Scale Audio Classification

A Novel Classification-Based Audio Segmentation Algorithm

A Two-Stage Content-Based Audio Segmentation Algorithm

Auditory Sparse Representation for Robust Speaker Recognition Based on Tensor Structure

Multifactor Sparse Feature Extraction Using Convolutive Nonnegative Tucker Decomposition

Large Scale Environmental Sound Classification Based on Efficient Feature Extraction.

Listening and Grouping: an Online Autoregressive Approach for Monaural Speech Separation

Audio Word2Vec: Unsupervised Learning of Audio Segment Representations using Sequence-to-sequence Autoencoder

Neural Network Model Based on the Tensor Network for Audio Tagging of Domestic Activities

A Robust Time-frequency Decomposition Model for Suppression of Mixed Gaussian-impulse Noise in Audio Signals

Optimizing Cepstral Features for Audio Classification

Trace Norm Regularized Tensor Classification and Its Online Learning Approaches

Exploiting Low-Rank Tensor-Train Deep Neural Networks Based on Riemannian Gradient Descent with Illustrations of Speech Processing

Learning Long-Term Filter Banks for Audio Source Separation and Audio Scene Classification