Memory-Efficient Graph Convolutional Networks for Object Classification and Detection with Event Cameras

Kamil Jeziorek,Andrea Pinna,Tomasz Kryjak
DOI: https://doi.org/10.23919/SPA59660.2023.10274464
2023-07-26
Abstract:Recent advances in event camera research emphasize processing data in its original sparse form, which allows the use of its unique features such as high temporal resolution, high dynamic range, low latency, and resistance to image blur. One promising approach for analyzing event data is through graph convolutional networks (GCNs). However, current research in this domain primarily focuses on optimizing computational costs, neglecting the associated memory costs. In this paper, we consider both factors together in order to achieve satisfying results and relatively low model complexity. For this purpose, we performed a comparative analysis of different graph convolution operations, considering factors such as execution time, the number of trainable model parameters, data format requirements, and training outcomes. Our results show a 450-fold reduction in the number of parameters for the feature extraction module and a 4.5-fold reduction in the size of the data representation while maintaining a classification accuracy of 52.3%, which is 6.3% higher compared to the operation used in state-of-the-art approaches. To further evaluate performance, we implemented the object detection architecture and evaluated its performance on the N-Caltech101 dataset. The results showed an accuracy of 53.7 % mAP@0.5 and reached an execution rate of 82 graphs per second.
Computer Vision and Pattern Recognition,Image and Video Processing
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the memory - efficiency problem in event camera data processing. Specifically, the authors focus on how to significantly reduce memory consumption while maintaining computational performance when using Graph Convolutional Networks (GCNs) for object classification and detection. #### Main problems: 1. **High memory consumption**: Existing research mainly focuses on optimizing computational costs while ignoring the associated memory costs. This has led to models facing memory bottlenecks when processing large - scale data. 2. **Redundant data representation**: Traditional event data processing methods (such as projecting events onto a dense two - dimensional pseudo - representation or reconstructing them into grayscale frames) lose the key characteristics of event cameras (such as high temporal resolution and data sparsity), and these methods have high computational complexity. 3. **Limitations of existing GCN methods**: Although previous studies have shown that GCNs can be used to process event data, these methods usually need to store a large amount of edge - attribute information, further increasing the memory requirements. #### Solutions: To address the above problems, the authors propose a new method to optimize the memory efficiency of graph convolutional networks in event camera data processing. They achieve this goal in the following ways: - **Comparing different graph convolution operations**: The authors analyze the impact of different graph convolution operations on memory and computational resources and select more efficient convolution operations (such as PointNetConv), thereby reducing the number of model parameters and memory footprint. - **Reducing edge - attribute information**: By omitting unnecessary edge - attribute information, the authors significantly reduce the memory required for data representation. - **Optimizing the model architecture**: By designing a more compact model architecture (such as using residual connections and multi - scale pooling), the authors improve the memory efficiency and computational performance of the model. #### Experimental results: - **Reduction in the number of parameters**: The number of parameters in the feature extraction module has been reduced by 450 times. - **Reduction in data - representation memory**: The memory requirements for data representation have been reduced by 4.5 times. - **Classification accuracy**: An accuracy of 52.3% has been achieved in the classification task, which is 6.3% higher than that of existing methods. - **Detection performance**: On the N - Caltech101 dataset, the mAP@0.5 of the detection task has reached 53.7%, and 82 graphs can be processed per second. In conclusion, this paper provides a more efficient method for event camera data processing by optimizing the memory efficiency of graph convolutional networks, which both reduces memory consumption and maintains good performance.