Fourier-based Action Recognition for Wildlife Behavior Quantification with Event Cameras

Friedhelm Hamann,Suman Ghosh,Ignacio Juarez Martinez,Tom Hart,Alex Kacelnik,Guillermo Gallego
2024-10-09
Abstract:Event cameras are novel bio-inspired vision sensors that measure pixel-wise brightness changes asynchronously instead of images at a given frame rate. They offer promising advantages, namely a high dynamic range, low latency, and minimal motion blur. Modern computer vision algorithms often rely on artificial neural network approaches, which require image-like representations of the data and cannot fully exploit the characteristics of event data. We propose approaches to action recognition based on the Fourier Transform. The approaches are intended to recognize oscillating motion patterns commonly present in nature. In particular, we apply our approaches to a recent dataset of breeding penguins annotated for "ecstatic display", a behavior where the observed penguins flap their wings at a certain frequency. We find that our approaches are both simple and effective, producing slightly lower results than a deep neural network (DNN) while relying just on a tiny fraction of the parameters compared to the DNN (five orders of magnitude fewer parameters). They work well despite the uncontrolled, diverse data present in the dataset. We hope this work opens a new perspective on event-based processing and action recognition.
Computer Vision and Pattern Recognition,Emerging Technologies
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to use Fourier - transform - based methods to recognize the behaviors of wild animals, especially the "ecstatic display" (ED) behavior of penguins, through event cameras. Specifically, the research aims to develop an efficient and low - power - consumption behavior recognition method to overcome the limitations of traditional image - frame - rate cameras in dealing with complex natural environments. ### Research Background and Problem Description 1. **Advantages of Event Cameras**: - An event camera is a new type of bio - inspired visual sensor. It records pixel brightness changes asynchronously, rather than capturing images at a fixed frame rate like traditional cameras. - Event cameras have the advantages of high dynamic range, low latency, and minimal motion blur, which are very suitable for behavior recognition tasks in complex natural environments. 2. **Limitations of Existing Methods**: - Most current computer vision algorithms rely on deep neural networks (DNN). These models require a large amount of training data and computing resources and are difficult to apply to low - power - consumption scenarios. - Traditional image - based methods cannot fully utilize the unique characteristics of event data, such as high temporal resolution and high sensitivity to motion. 3. **Research Objectives**: - Propose a Fourier - transform - based action recognition method to recognize common oscillatory motion patterns in nature. - Apply this method to a dataset containing the "ecstatic display" behavior of penguins during the breeding period to verify its effectiveness. - Explore how to use the unique characteristics of event cameras to develop more efficient and simpler classifiers, thereby achieving low - power - consumption and real - time behavior recognition. ### Method Overview 1. **Data Pre - processing**: - The event data is summarized into a low - dimensional signal (signed event rate \( r[k] \)), and then transformed into the frequency domain \( R[f] \) through Fourier transform. - Calculate the energy \( E_{f_l, f_u} = 2 \int_{f_l}^{f_u} |R[f]|^2 df \) within a specific frequency band and perform normalization processing. 2. **Classification Methods**: - **Energy - Band Classifier**: Based on the normalized energy - band features \( \hat{E}_{f_l, f_u} \), set a threshold \( \lambda \) for classification. \[ \hat{E}_{f_l, f_u} = \frac{E_{f_l, f_u}}{E_{0, \infty}} \] If \( \hat{E}_{f_l, f_u} > \lambda \), it is considered that there is an "ecstatic display" behavior. - **Full - Spectrum Classifier**: Use full - spectrum data to train two artificial neural network (ANN) - based classifiers, namely the fully - connected layer and the convolutional layer. 3. **Experiments and Evaluation**: - Use a dataset containing the behaviors of penguins during the breeding period in Antarctica to conduct experiments and evaluate the performance of different classifiers. - The main evaluation metrics include Precision, Recall, and F1 Score. ### Experimental Results - **Energy - Band Classifier**: It is simple and effective, with a very small number of parameters (only 54 parameters) and an F1 score of 0.54. - **Full - Spectrum Classifier**: Its performance is slightly better than that of the energy - band classifier, but it has a larger number of parameters (1,700 and 40,600 parameters respectively). - **2D CNN Classifier**: Although it has the best performance (F1 score of 0.72), it has a huge number of parameters (11.4 million parameters) and is not suitable for low - power - consumption scenarios. ### Conclusion This research proposes an efficient behavior recognition method based on Fourier transform, which can accurately recognize the "ecstatic display" behavior of penguins under low - power - consumption conditions. Compared with existing deep - learning methods, this method not only has a small number of parameters but also is more interpretable, providing new ideas for the future development of real - time, low - power - consumption behavior recognition applications.