Abstract:With the exponential growth in the production creation of multimedia data, there is an increasing need for video semantic analysis. Audio, as a significant part of video, provides important cues to human perception when humans are browsing and understanding video contents. To detect semantic content by useful audio information, we introduce audio keywords which are sets of specific audio sounds related to semantic events. In our previous work, we designed a hierarchical Support Vector Machine (SVM) classifier for audio keyword identification. However, a weakness of our previous work is that audio signals are artificially segmented into 20 ms frames for frame-based SVM identification without any contextual information. In this paper, we propose a classification method based on Hidden Markov Modal (HMM) for audio keyword identification as an improved work instead of using hierarchical SVM classifier. Choosing HMM is motivated by the successful story of HMM in speech recognition. Unlike the frame-based SVM classification followed by major voting, our proposed HMM-based classifiers treat specific sound as a continuous time series data and employ hidden states transition to capture context information. In particular, we study how to find an effective HMM, i.e., determining topology, observation vectors and statistical parameters of HMM. We also compare different HMM structures with different hidden states, and adjust time series data with variable length. Experimental data includes 40 minutes basketball au-dio which comes from real-time sports games. Experimental results show that, for audio keyword generation, the proposed HMM-based method outperforms the previous hierarchical SVM.

Identification of Objectionable Audio Segments Based on Pseudo and Heterogeneous Mixture Models

Facial feature points detecting based on Gaussian Mixture Models

Semisupervised Robust Modeling of Multimode Industrial Processes for Quality Variable Prediction Based on Student's T Mixture Model.

Efficient video object segmentation based on Gaussian mixture model and Markov random field

Using MCE Algorithm to Improve the Performance of Speaker Recognition

Robust text-independent speaker identification using Gaussian mixture speaker models

Speaker Identification based on LSP and Gaussian Mixture Model

Gaussian mixture model for relevance feedback in image retrieval

Sound event detection in remote health care - small learning datasets and over constrained Gaussian Mixture Models

Performance of Gaussian Mixture Model Classifiers on Embedded Feature Spaces

HMM-Based Audio Keyword Generation

Classification of Facial Images Using Gaussian Mixture Models.

GMM-HMM Acoustic Model Training by a Two Level Procedure with Gaussian Components Determined by Automatic Model Selection

Discriminative Dynamic Gaussian Mixture Selection with Enhanced Robustness and Performance for Multi-Accent Speech Recognition

Hybrid Svm/Hmm Approach For Audio Classification

Using Subband Mel-spectrum Centroid and Gaussian Mixture Correlation for Robust Speaker Identification

Boosting Gaussian Mixture Models Via Discriminant Analysis

A Novel Split and Merge EM Algorithm for Gaussian Mixture Model

A Greedy Merge Learning Algorithm for Gaussian Mixture Model

Improvement of hidden Markov model (HMM) for speech recognition

Multiple Background Models for Speaker Verification.