Speaker Recognition for DSR

Mohamed Abdel Fattah,Fuji Ren,Shingo Kuroiwa
2005-01-01
Abstract:Due to the coexistence of different compression algorithms in the fixed and mobile telephone networks, it is impossible to predict which combination of coders and channels the speech has undergone before arriving to the server. To overcome the previous mentioned problem, the European Telecommunication Standards Institute (ETSI) has standardized a front-end for Distributed Speech Recognition (DSR). But once again, the distortion added due to feature compression in the front-end side increases the variance flooring effect that increases the identification error rate. The penalty incurred in reducing the bitrate is degradation in speaker recognition performance. In this paper we present a non traditional solution for the previous mentioned problems. To reduce the bitrate, speech signal is segmented at client and the most effective phonemes for speaker recognition are selected to be sent to the server. Speaker recognition is occurred at server. Applying this approach on YOHO corpus, we could achieve 0.05% identification error rate (ER) using an average segment of 20.4% of the testing utterance for recognition. This result outperforms previously published results on the speaker identification task from error rate (ER) point of view as well as the minimum speech segment required for speaker identification.
What problem does this paper attempt to address?