The design and validation of a hybrid digital-signal-processing plug-in for traditional cochlear implant speech processors

Fatemeh Hajiaghababa,Hamid R Marateb,Saeed Kermani
DOI: https://doi.org/10.1016/j.cmpb.2018.03.003
Abstract:Background and objective: Cochlear implants (CIs) are electronic devices restoring partial hearing to deaf individuals with profound hearing loss. In this paper, a new plug-in for traditional IIR filter-banks (FBs) is presented for cochlear implants based on wavelet neural networks (WNNs). Having provided such a plug-in for commercially available CIs, it is possible not only to use available hardware in the market but also to optimize their performance compared with the-state-of-the-art. Methods: An online database of Dutch diphone perception was used in our study. The weights of the WNNs were tuned using particle swarm optimization (PSO) on a training set (speech-shaped noise (SSN) of 2 dB SNR), while its performance was assessed on a test set in terms of objective and composite measures in the hold-out validation framework. The cost function was defined based on the combination of mean square error (MSE), short‑time objective intelligibility (STOI) criteria on the training set. Variety of performance indices were used including segmental signal- to -noise ratio (SNRseg), MSE, STOI, log-likelihood ratio (LLR), weighted spectral slope (WSS), and composite measures Csig,Cbak and Covl. Meanwhile, the following CI speech processing techniques were used for comparison: traditional FBs, dual resonance nonlinear (DRNL) and simple dual path nonlinear (SPDN) models. Results: The average SNRseg, MSE, and LLR values for the WNN in the entire data set were 2.496 ± 2.794, 0.086 ± 0.025 and 2.323 ± 0.281, respectively. The proposed method significantly improved MSE, SNR, SNRseg, LLR, Csig Cbak and Covl compared with the other three methods (repeated-measures analysis of variance (ANOVA); P < 0.05). The average running time of the proposed algorithm (written in Matlab R2013a) on the training and test sets for each consonant or vowel on an Intel dual-core 2.10 GHz CPU with 2GB of RAM was 9.91 ± 0.87 (s) and 0.19 ± 0.01 (s), respectively. Conclusions: The proposed algorithm is accurate and precise and is thus a promising new plug-in for traditional CIs. Although the tuned algorithm is relatively fast, it is necessary to use efficient vectorized implementations for real-time CI speech signal processing.
What problem does this paper attempt to address?