Subspace Models For Bottleneck Features

Jun Qi,Dong Wang,Javier Tejedor
DOI: https://doi.org/10.21437/interspeech.2013-434
2013-01-01
Abstract:The bottleneck (BN) feature, particularly based on deep structures, has gained significant success in automatic speech recognition (ASR). However, applying the BN feature to small/medium-scale tasks is nontrivial. An obvious reason is that the limited training data prevent from training a complicated deep network; another reason, which is more subtle, is that the BN feature tends to possess high inter-dimensional correlation, thus being inappropriate to be modeled by the conventional diagonal Gaussian mixture model (GMM). This difficulty can be mitigated by increasing the number of Gaussian components and/or employing full covariance matrices. These approaches, however, are not applicable for small/medium-scale tasks for which only a limited amount of training data is available.In this paper, we study the subspace Gaussian mixture model (SGMM) for BN features. The SGMM assumes full but shared covariance matrices, and hence can address the inter dimensional correlation in a parsimonious way. This is particularly attractive for the BN feature, especially on small/mediumscale tasks, where the inter-dimensional correlation is high but the full covariance modeling is not affordable due to the limited training data. Our preliminary experiments on the Resource Management (RM) database demonstrate that the SGMM can deliver significant performance improvement for ASR systems based on BN features.
What problem does this paper attempt to address?