Abstract:Since birds are members of ecosystem which can be assumed as monitors of ecological environment. Bird recognition, especially birdsong recognition, has attracted more and more attention in the field of artificial intelligence. At present, traditional machine learning and deep learning are widely used in birdsong recognition. Deep learning can not only classify and recognize the spectrum features of birdsong, but also be used as a feature extractor. Machine learning is often used to classify and recognize the extracted birdsong handcrafted feature parameters. As the data samples of the classifier, the feature of birdsong directly determines the performance of the classifier. Multi-view features, different methods of feature extraction can obtain more perfect information of birdsong. Therefore, aiming at enriching the representational capacity of single feature and getting a better way to combine features, this paper proposes a birdsong classification model based multi-view features, which combines the deep features extracted by convolutional neural network (CNN) and handcrafted features. Firstly, four kinds of handcrafted features are extracted. Those are wavelet transform (WT) spectrum, Hilbert-Huang transform (HHT) spectrum, short-time Fourier transform (STFT) spectrum and Mel-frequency cepstral coefficients (MFCC). Then CNN is used to extract the deep features from WT, HHT and STFT spectrum features, and the minimal-redundancy-maximal-relevance (mRMR) to select optimal features. Finally, three classification models (random forest, support vector machine and multi-layer perceptron) are built with the deep features and handcrafted features, and the probability of classification results of the two types of features are fused as the new features to recognize birdsong. Taking sixteen species of birds as research objects, the experimental results show that the three classifiers obtain the accuracy of 95.49%, 96.25% and 96.16% respectively for the features of the proposed method, and which are better than the seven single features and three fused features involved in the experiment. This proposed method effectively combines the deep features and handcrafted features from the perspectives of signal. The fused features can more comprehensively express the information of the bird audio itself, and have higher classification accuracy and lower dimension, which can effectively improve the performance of bird audio classification.

A Cross-Modal Semantic Alignment and Feature Fusion Method for Bionic Drone and Bird Recognition

Adaptive Switching Spatial-Temporal Fusion Detection for Remote Flying Drones

Alignment and Fusion Using Distinct Sensor Data for Multimodal Aerial Scene Classification

Learnable Cross-Scale Sparse Attention Guided Feature Fusion for UAV Object Detection

Cross-Modal Oriented Object Detection of UAV Aerial Images Based on Image Feature

Airport Near-Altitude Flying Birds Detection Based on Information Compensation Multiscale Feature Fusion

Cross-domain Deep Feature Combination for Bird Species Classification with Audio-visual Data

Fusion-Mamba for Cross-modality Object Detection

Fusing Local Shallow Features and Global Deep Features to Identify Beaks

A Human–Computer Fusion Framework for Aircraft Recognition in Remote Sensing Images

Deformable Convolution-Guided Multiscale Feature Learning and Fusion for UAV Object Detection

Micro-Motion Classification of Flying Bird and Rotor Drones via Data Augmentation and Modified Multi-Scale CNN

Multi-scale object detection in UAV images based on adaptive feature fusion

A Flying Bird Object Detection Method for Surveillance Video

Attention-Guided Multi-Scale Fusion Network for Similar Objects Semantic Segmentation

Modality Meets Long-Term Tracker: A Siamese Dual Fusion Framework for Tracking UAV

Cross-Modal Semantic Alignment before Fusion for Two-Pass End-to-End Spoken Language Understanding

Multi-view features fusion for birdsong classification

BAFusion: Bidirectional Attention Fusion for 3D Object Detection Based on LiDAR and Camera

A Novel Bird Sound Recognition Method Based on Multifeature Fusion and a Transformer Encoder

Finger Multimodal Feature Fusion and Recognition Based on Channel Spatial Attention