Revisiting Modality Imbalance In Multimodal Pedestrian Detection

Arindam Das,Sudip Das,Ganesh Sistu,Jonathan Horgan,Ujjwal Bhattacharya,Edward Jones,Martin Glavin,Ciarán Eising
2023-07-07
Abstract:Multimodal learning, particularly for pedestrian detection, has recently received emphasis due to its capability to function equally well in several critical autonomous driving scenarios such as low-light, night-time, and adverse weather conditions. However, in most cases, the training distribution largely emphasizes the contribution of one specific input that makes the network biased towards one modality. Hence, the generalization of such models becomes a significant problem where the non-dominant input modality during training could be contributing more to the course of inference. Here, we introduce a novel training setup with regularizer in the multimodal architecture to resolve the problem of this disparity between the modalities. Specifically, our regularizer term helps to make the feature fusion method more robust by considering both the feature extractors equivalently important during the training to extract the multimodal distribution which is referred to as removing the imbalance problem. Furthermore, our decoupling concept of output stream helps the detection task by sharing the spatial sensitive information mutually. Extensive experiments of the proposed method on KAIST and UTokyo datasets shows improvement of the respective state-of-the-art performance.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the modal imbalance problem in multi - modal pedestrian detection. Specifically, multi - modal learning (especially pedestrian detection) performs well in handling key autonomous driving scenarios such as low - light, nighttime, and bad weather conditions. However, in most cases, the training data distribution often over - emphasizes a particular input modality, causing the network to be biased towards a certain modality. This bias limits the generalization ability of the model, especially when the non - dominant modality contributes more during the training process. ### Specific manifestations of the problem 1. **Modal bias**: For example, in a nighttime scene, if only nighttime images are used for training, the network may be biased towards thermal imaging (thermal), thus limiting its generalization ability in other scenarios. 2. **Unbalanced feature fusion**: Different modalities (such as visible light and infrared) perform quite differently in some scenarios, resulting in unbalanced information extraction during feature fusion. ### Solutions To solve these problems, the author introduced a new training framework and regularization method to balance the information contributions between different modalities and make multi - modal feature fusion more robust. Specific measures include: 1. **Logarithmic Sobolev Inequality**: By introducing the logarithmic Sobolev inequality, it is ensured that the information extracted from the two modalities is equivalently important. 2. **Multi - stream decoupled detection branch**: A multi - stream decoupled detection branch is designed so that related tasks can better share spatially - sensitive information. 3. **Systematic ablation study**: The effects of different backbone networks, training strategies, and network components are verified through extensive experiments, ensuring the effectiveness and robustness of the method. ### Experimental results The author conducted experiments on two public datasets, KAIST and UTokyo, and the results show that this method significantly improves the performance of multi - modal pedestrian detection and reaches the current state - of - the - art level. ### Summary The main contribution of this paper lies in proposing a new end - to - end multi - modal architecture and cost - function regularization method, which solves the modal imbalance problem in multi - modal pedestrian detection and achieves more accurate pedestrian detection.