Deep Neural Network Bottleneck Features for Bird Species Verification

Jinming Zhao,Yanyan Xu,Dengfeng Ke,Kaile Su
DOI: https://doi.org/10.1109/ijcnn.2017.7965951
2017-01-01
Abstract:Recently, bottleneck features as effective representations have been successfully used in Speaker Recognition (SR) and Language Recognition (LR), but little work has focused on bottleneck features for Bird Species Verification (BSV). In SR, LR and BSR tasks, using short-time spectra features may be insufficient, so it need some more abstract and discriminative representations as complementation to conventional spectra features. Some SR and LR work shows that bottleneck features can form a low-dimension representation of the original inputs with a powerful descriptive and discriminative capability. Due to the general audio representation principles of speakers, language and birds being similar, we propose a hypothesis: the bottleneck features are also useful for BSV. Therefore, in this paper, we use the bottleneck feature framework based on the standard i-vector framework to deal with crucial problems in conventional methods of BSV, such as the session variability and insufficient features. Moreover, we make no distinction between bird calls and bird songs in the evaluation phase. Experimental results show that the standard i-vector system and the bottleneck feature system gain 3.39% and 0.85% Equal Error Rate (EER) respectively. The bottleneck feature system obtains 75% relative improvement over the standard i-vector system, meaning that the bottleneck features as a complementation to spectra features are significantly useful for BSV. The deep feature system, which is an another state-of-the-art framework based on deep features used in SR, however, only results in 18.64% EER, which is much worse than the other two systems, and a brief explanation is provided in this paper.
What problem does this paper attempt to address?