Abstract:Existing 3D object detection frameworks in sensor-based applications heavily rely on large-scale annotated data to achieve optimal performance. However, obtaining such annotations from sensor data—like LiDAR or image sensors—is both time-consuming and costly. Semi-supervised learning offers an efficient solution to this challenge and holds significant potential for sensor-driven artificial intelligence (AI) applications. While it reduces the need for labeled data, semi-supervised learning still depends on a small amount of labeled samples for training. In the initial stages, relying on such limited samples can adversely affect the effective training of student–teacher networks. In this paper, we propose PE-MCAT, a semi-supervised 3D object detection method that generates high-precision pseudo-labels. First, to address the challenges of insufficient local feature capture and poor robustness in point cloud data, we introduce a point enrichment module. This module incorporates information from image sensors and combines multiple feature fusion methods of local and self-features to directly enhance the quality of point clouds and pseudo-labels, compensating for the limitations posed by using only a few labeled samples. Second, we explore the relationship between the teacher network and the pseudo-labels it generates. We propose a multi-class adaptive threshold strategy to initially filter and create a high-quality pseudo-label set. Furthermore, a joint variable threshold strategy is introduced to refine this set further, enhancing the selection of superior pseudo-labels.Extensive experiments demonstrate that PE-MCAT consistently outperforms recent state-of-the-art methods across different datasets. Specifically, on the KITTI dataset and using only 2% of labeled samples, our method improved the mean Average Precision (mAP) by 0.7% for cars, 3.7% for pedestrians, and 3.0% for cyclists.

Custom Object Detection via Multi-Camera Self-Supervised Learning

Multi-View Domain Adaptive Object Detection on Camera Networks.

Scaling Multi-Camera 3D Object Detection through Weak-to-Strong Eliciting

CISO: Co-iteration Semi-Supervised Learning for Visual Object Detection

A Simple Baseline for Multi-Camera 3D Object Detection

Vision-Based Multiscale Construction Object Detection under Limited Supervision

Semi-Supervised Self-Training of Object Detection Models

Multi-Label Self-Supervised Learning with Scene Images

PE-MCAT: Leveraging Image Sensor Fusion and Adaptive Thresholds for Semi-Supervised 3D Object Detection

Weakly Supervised Monocular 3D Object Detection by Spatial-Temporal View Consistency

3D Object Aided Self-Supervised Monocular Depth Estimation

Semi-Supervised Object Detection with Multi-Scale Regularization and Bounding Box Re-Prediction

View-to-Label: Multi-View Consistency for Self-Supervised 3D Object Detection

Self-supervised co-salient object detection via feature correspondence at multiple scales

Self-Supervised Object Distance Estimation Using a Monocular Camera

Multi-scale coupled attention for visual object detection

Multi-clue Consistency Learning to Bridge Gaps Between General and Oriented Object in Semi-supervised Detection

Multi-Channel CNN-based Object Detection for Enhanced Situation Awareness

MM-FSOD: Meta and metric integrated few-shot object detection

Pyramidal Multiple Instance Detection Network With Mask Guided Self-Correction for Weakly Supervised Object Detection

MSL3D: 3D object detection from monocular, stereo and point cloud for autonomous driving