3D Convolutional Neural Networks Based Speaker Identification and Authentication.

Jianguo Liao,Shilin Wang,Xingxuan Zhang,Gongshen Liu
DOI: https://doi.org/10.1109/icip.2018.8451204
2018-01-01
Abstract:Research shows that human lips can be used as a new kind of biometrics in personal identification and authentication. In this letter, a novel end-to-end method based on 3D convolutional neural network (3DCNN) is proposed to extract discriminative spatiotemporal features from raw lip video streams. In our approach, the lip video is first divided into a series of overlapping clips. For each clip, the lip-characteristics network is proposed to characterize the minutiae of the lip region and its movement. Finally, the entire lip video is represented by a set of sub-features corresponding to each clip in it. Experiments have been performed on a dataset with 200 speakers and the proposed method achieves high identification accuracy of 99.18% and very low authentication error (HTER of 0.15%). Compared with several state-of-the-art methods, our approach achieves better performance and higher robustness against variations caused by different speaker's pose and position.
What problem does this paper attempt to address?