Learnable fusion mechanisms for multimodal object detection in autonomous vehicles

Yahya Massoud,Robert Laganiere
DOI: https://doi.org/10.1049/cvi2.12259
IF: 1.484
2024-03-17
IET Computer Vision
Abstract:The authors propose a deep learning‐based sensor fusion framework that uses both camera and LiDAR sensors in a multi‐modal and multi‐view setting. In order to leverage both data streams, two fusion mechanisms are incorporated: element‐wise multiplication and multi‐modal factorised bilinear pooling. The authors provide a detailed study of important design choices that contribute to the performance of deep learning‐based sensor fusion frameworks such as data augmentation, multi‐task learning, and the design of convolutional architecture. Perception systems in autonomous vehicles need to accurately detect and classify objects within their surrounding environments. Numerous types of sensors are deployed on these vehicles, and the combination of such multimodal data streams can significantly boost performance. The authors introduce a novel sensor fusion framework using deep convolutional neural networks. The framework employs both camera and LiDAR sensors in a multimodal, multiview configuration. The authors leverage both data types by introducing two new innovative fusion mechanisms: element‐wise multiplication and multimodal factorised bilinear pooling. The methods improve the bird's eye view moderate average precision score by +4.97% and +8.35% on the KITTI dataset when compared to traditional fusion operators like element‐wise addition and feature map concatenation. An in‐depth analysis of key design choices impacting performance, such as data augmentation, multi‐task learning, and convolutional architecture design is offered. The study aims to pave the way for the development of more robust multimodal machine vision systems. The authors conclude the paper with qualitative results, discussing both successful and problematic cases, along with potential ways to mitigate the latter.
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?