Reducing Tongue Shape Dimensionality from Hundreds of Available Resources Using Autoencoder

Minghao Yang,Dawei Zhang,Jianhua Tao
DOI: https://doi.org/10.1109/icpr.2018.8545185
2018-01-01
Abstract:In spite of various observation tools, tongue shapes are still scarce resource in reality. Autoencoder, a kind of deep neural networks (DNN), performs well on data reduction and pattern discovery. However, since autoencoder usually needs large scale data in training, challenges exist for traditional autoencoder to obtain tongues' motion patterns only from tens or hundreds of available tongue shapes. To overcome this problem, we propose a two-steps autoencoder, where we first construct a stacked denoising autoencoder (dAE) to learn the essential presentation of the tongue shapes from their possible deformations; then an additional autoencoder with small number of hidden units is added upon the previous stacked autoencoder, and used for dimensionality reduction. Experiments run on 240 vowels' tongue shapes obtained from Chinese speakers' pronunciation X-ray films, and the proposed model is compared with traditional dAE and the classical principal component analysis (PCA) on dimensionality reduction and reconstruction in details. Results validate the performance of the proposed tongue model.
What problem does this paper attempt to address?