Birdsong classification based on multi feature channel fusion

Zhihua Liu,Wenjie Chen,Aibin Chen,Guoxiong Zhou,Jizheng Yi
DOI: https://doi.org/10.1007/s11042-022-12570-3
IF: 2.577
2022-02-28
Multimedia Tools and Applications
Abstract:Aiming at the essential feature of the time-continuity of birdsong in nature, this paper proposed a birdsong classification model composed of two feature channels, which combines the features of time domain and time frequency domain. In order to make better use of the features, we used the improved average threshold method to denoise the original time-domain waveform features to reduce the influence of noise features. The most suitable feature extractor and the best fusion method of these two features are discussed. In this paper, the 3D convolutional neural network (3DCNN) and 2D convolutional neural network (2DCNN) were respectively applied as feature extractors of log_mel spectrum and waveform images. Then the advanced feature, which was extracted from these two feature channels, was fused in the middle stage, and the output enhanced feature was used as the input of double gated recurrent unit (d-GRU) network. In the work, birdsongs of four species from Xeno-Canto were selected for testing. The results showed that these three methods had improved the classification effect: feature fusion method in time domain and time-frequency domain, weighted average threshold noise reduction method and the method of extracting birdsong features via different types of feature extractors. The method of this paper had achieved mean average precision (MAP) of 95.9% in the classification comparison experiments, which was an inspiring outcome.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering
What problem does this paper attempt to address?