Improving Multiple Dense Prediction Performances by Exploiting Inter-Task Synergies for Neuromorphic Vision Sensors

Tian Zhang,Zhaokun Li,Jianbo Su,Jingjing Fang
DOI: https://doi.org/10.1109/jsen.2024.3411088
IF: 4.3
2024-01-01
IEEE Sensors Journal
Abstract:Recent years have witnessed neuromorphic vision sensor (NVS) driving the performance of dense prediction in the domain of visual perception because of its unique properties. Although prior works have designed elaborate pipelines to solve different dense prediction problems, they do not consider the large amount of information synergies among tasks, which are compensated by complex models as well as large amounts of labeled data. To this end, we propose to exploit inter-task synergies to improve the performance of dense predictions in the NVS domain and introduce the first multitask learning (MTL) model for NVS. Specifically, our NVS-oriented MTL model employs a hard parameter sharing scheme. A shared encoder is adopted to extract a universal representation from the input event streams, which is then branched into several task-specific decoders for separating domain-specific information within individual tasks. To account for the distinctive nature of event streams, a hierarchical recurrent vision Transformer (RViT) backbone is proposed as the shared encoder part. It is capable of modeling both global and local spatial context from sparse event signals while also leveraging the temporal cues within. Extensive evaluations are conducted on a recent NVS benchmark (DSEC) to verify our model. Our method outperforms all baselines as well as the state-of-the-art (SOTA) MTL networks dedicated to conventional cameras on two dense prediction tasks simultaneously. In addition, extensive ablations demonstrate the effectiveness of our architectural design and component selections. We believe our research paves the way for the introduction of the MTL strategy in the field of event-based vision.
What problem does this paper attempt to address?