Bimodal speaker identification using dynamic bayesian network

Dongdong Li,LiFeng Sang,Yingchun Yang,Zhaohui Wu
DOI: https://doi.org/10.1007/978-3-540-30548-4_66
2004-01-01
Abstract:The authentication of a person requires a consistently high recognition accuracy which is difficult to attain using a single recognition modality This paper assesses the fusion of voiceprint and face feature for bimodal speaker identification using Dynamic Bayesian Network (DBN) Our contribution is to propose a general feature-level fusion framework in bimodal speaker identification Within the framework, the voice and face feature are combined into a single DBN to obtain better performance than any single system alone The tests were conducted on a multi-modal database of 54 users who provided voiceprint and face data of different speech type and content .We compare our approach with mono-modal system and other classic decision-level methods and show that feature-level fusion using dynamic Bayesian network improved performance by about 4-5%, much better than the others.
What problem does this paper attempt to address?