Weight Estimation for Audio-Visual Multi-level Fusion in Bimodal Speaker Identification

Wu Zhiyong,Cai Lianhong,Meng Helen M.
DOI: https://doi.org/10.1007/978-3-540-37258-5_144
2006-01-01
Abstract:This paper investigates the estimation of fusion weights under varying acoustic noise conditions for audio-visual multi-level hybrid fusion strategy in speaker identification. The multi-level fusion combines model level and decision level fusion via dynamic Bayesian networks (DBNs). A novel methodology known as support vector regression (SVR) is utilized to estimate the fusion weights directly from audio features; Sigma-Pi network sampling method is also incorporated to reduce feature dimensions. Experiments on the homegrown Chinese database and CMU English database both demonstrate that the method improves the accuracies of audio-visual bimodal speaker identification under dynamically varying acoustic noise conditions.
What problem does this paper attempt to address?