Suppression by Selecting Wavelets for Feature Compression in Distributed Speech Recognition

Syu-Siang Wang,Payton Lin,Yu Tsao,Jeih-Weih Hung,Borching Su
DOI: https://doi.org/10.1109/TASLP.2017.2779787
2018-03-01
Abstract:Distributed speech recognition DSR splits the processing of data between a mobile device and a network server. In the front-end, features are extracted and compressed to transmit over a wireless channel to a back-end server, where the incoming stream is received and reconstructed for recognition tasks. In this paper, we propose a feature compression algorithm termed suppression by selecting wavelets SSW to achieve the two main goals of DSR: Minimizing memory and device requirements while also maintaining or even improving the recognition performance. The SSW approach first applies the discrete wavelet transform DWT to filter the incoming speech feature sequence into two temporal subsequences at the client terminal. Feature compression is achieved by keeping the low modulation frequency subsequence while discarding the high frequency counterpart. The low-frequency subsequence is then transmitted across the remote network for specific feature statistics normalization. Wavelets are favorable for resolving the temporal properties of the feature sequence, and the down-sampling process in DWT achieves data compression by reducing the amount of data at the terminal prior to transmission across the network. Once the compressed features have arrived at the server, the feature sequence can be enhanced by statistics normalization, reconstructed with inverse DWT, and compensated with a simple post filter to alleviate any over-smoothing effects from the compression stage. Results on a standard robustness task Aurora-4 and on a Mandarin Chinese news corpus showed SSW outperforms conventional noise-robustness techniques while also providing nearly a 50% compression rate during the transmission stage of DSR systems.
engineering, electrical & electronic,acoustics
What problem does this paper attempt to address?