Dominant Subspace Analysis for Auditory Spectrum
Xugang Lu,Gang Li,Lipo Wang
DOI: https://doi.org/10.21437/icslp.2000-477
2000-01-01
Abstract:In hearing perception theory, spectral structure is a most important feature for speech perception, this spectral structure is not easy to be masked in noisy condition. So if this structure is extracted and enhanced, the representation will be much more robust. In this paper, we propose a new statistical dominant subspace analysis method for auditory spectrum based on SVD(Singular Values Decomposition) and signal subspace analysis method. The auditory spectrum can be decomposed into two subspaces, one is a dominant subspace, which is expanded by useful speech auditory spectrum , another subspace is sub-dominant subspace, which there is only noise information. So we analysis the auditory spectrum in the dominant subspace, the SNR will be increased. Thus this representation is much more robust. 1. COMPUTATIONAL AUDITORY MODEL AND AUDITORY SPECTRUM Speech stimulation can be represented by auditory neural system in many stages. First, it can be decomposed into many frequency bands by basilar membrane, then after processed by inner hair cell and neural fibers, it's intensity is represented by neural firing rate. This neural impulse can be transformed to auditory central system, where it can be perceived by auditory cortex[1]. In this paper, all the processing parts are integrated using digital signal processing method, when speech signal is processed by this model, auditory feature can be gotten. The basic processing frame is as in Fig.1: the system is made up of six parts, that is, the high pass filtering of outer ear and middle ear, the band pass filtering of basilar membrane, nonlinear compression and half wave rectifying of inner hair cell, low pass filtering of neural fiber, energy detection of central system, Figure1 Auditory model for speech signal processing A mathematical model is designed to simulate this auditory function, as in Fig.2, a low pass filter is used to simulate the long temporal integration mechanism. The function of outer/middle ear can be simulated by high pass filter; band pass filters for basilar membrane; halfwave rectify for inner hair cell; low pass filter for neural fiber; energy detector and log compression for neural central, at last a DCT is used to get the feature vector. Figure2 The mathematical model for Figure 1 In these modules, short term adaptation and rapid adaptation of inner hair cell and neural fiber are not considered. Also, functions of temporal integration of neural central system and low pass filtering of neural fiber are integrated as a low pass filter. Energy detector is used for the intensity detection for each frequency channel. After processed by this model, a auditory feature is gotten. The feature can be used for training and testing. In this paper, we only focus on the auditory spectrum analysis, so the auditory spectrum can be got from the energy detector of Figure 2. Visual representations of FFT, LPC, and Auditory Spectrum are drawn for comparison in figure 3.( the spectrum of a Chinese sentence ). Figure3 Top is FFT spectrum, middle is LPC spectrum. Bottom is Auditory Spectrum(AS) . From Fig.3, it is clear that AS(Auditory Spectrum) is wide band spectrum, FFT is narrow band spectrum. AS spectrum can be regarded as a smoothed spectrum of FFT spectrum in hearing perception scale(in frequency domain). It is very clear that , speech representation by auditory system is a series of time-frequency patches, these patches are different from noise patch. We can regard the speech feature as a continuous time-frequency patch with regular structure. Noise patches is random and no-regular, so we hope subspace decomposition method can help use to separate noise and speech by this property. 2 SIGNAL SUBSPACE AND SVD Signal subspace analysis method is widely used in digital signal processing and pattern recognition[2]. It is supposed that the useful information is only related with some lower dimensional subspaces, but noise is uniformly distributed in the whole measurement space (the whole Euclidean space). The subspace analysis method can decompose the whole measurement space into some useful subspaces, such as signal subspace and noise subspace or dominant subspace and subordinate subspace, then when the original feature is projected into the dominant subspaces, the dominant structure will be retained only, that is to say the subordinate feature(which including noise structure) will be reduced . From transformation view, we hope to find a new transform basis, the new feature gotten by this transformation can possess certain property, such as, each dimension of the feature vector is un-correlated or independent, etc. In this paper, we propose the subspace analysis method for the processing purpose. Singular Value Decomposition (SVD) is a very useful method for matrix structure decomposition. we give some useful formulas here which can be used later. Suppose the data matrix is n m R X × ∈ , there exist orthogonal matrices: m m m R u u U × ∈ = ) ,..., ( 1 n n m R v v V × ∈ = ) ,..., ( 1 satisfying: