Birdsong classification based on multi-feature fusion
Na Yan,Aibin Chen,Guoxiong Zhou,Zhiqiang Zhang,Xiangyong Liu,Jianwu Wang,Zhihua Liu,Wenjie Chen
DOI: https://doi.org/10.1007/s11042-021-11396-9
IF: 2.577
2021-09-08
Multimedia Tools and Applications
Abstract:The classification of birdsong has very important signification to monitor the bird population in the habitats. Aiming at the birdsong dataset with complex and diverse audio background, this paper attempts to introduce an acoustic feature for voice and music analysis: Chroma. It is spliced and fused with the commonly used birdsong features, Log-Mel Spectrogram (LM) and Mel Frequency Cepstrum Coefficient (MFCC), to enrich the representational capacity of single feature; At the same time, in view of the characteristic that birdsong has continuous and dynamic changes in time, a 3DCNN-LSTM combined model is proposed as a classifier to make the network more sensitive to the birdsong information that changes with time. In this paper, we selected four bird audio data from the Xeno-Canto website to evaluate how LM, MFCC and Chroma were fused to maximize the birdsong audio information. The experimental results show that the LM-MFCC-C feature combination achieves the best result of 97.9% mean average precision (mAP) in the experiment.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering