Talk to Parallel LiDARs: A Human-LiDAR Interaction Method Based on 3D Visual Grounding

Yuhang Liu,Boyi Sun,Guixu Zheng,Yishuo Wang,Jing Wang,Fei-Yue Wang

2024-05-24

Abstract:LiDAR sensors play a crucial role in various applications, especially in autonomous driving. Current research primarily focuses on optimizing perceptual models with point cloud data as input, while the exploration of deeper cognitive intelligence remains relatively limited. To address this challenge, parallel LiDARs have emerged as a novel theoretical framework for the next-generation intelligent LiDAR systems, which tightly integrate physical, digital, and social systems. To endow LiDAR systems with cognitive capabilities, we introduce the 3D visual grounding task into parallel LiDARs and present a novel human-computer interaction paradigm for LiDAR systems. We propose Talk2LiDAR, a large-scale benchmark dataset tailored for 3D visual grounding in autonomous driving. Additionally, we present a two-stage baseline approach and an efficient one-stage method named BEVGrounding, which significantly improves grounding accuracy by fusing coarse-grained sentence and fine-grained word embeddings with visual features. Our experiments on Talk2Car-3D and Talk2LiDAR datasets demonstrate the superior performance of BEVGrounding, laying a foundation for further research in this domain.

Computer Vision and Pattern Recognition,Human-Computer Interaction

What problem does this paper attempt to address?

The paper attempts to address the problem of how to endow LiDAR systems with cognitive abilities through the 3D visual grounding task in autonomous driving scenarios, specifically how to accurately locate target objects in a 3D scene based on textual descriptions. Current research mainly focuses on optimizing perception models, while exploration of deeper cognitive intelligence is relatively limited. To tackle this challenge, the paper introduces the concept of parallel LiDAR, a theoretical framework for a new generation of intelligent LiDAR systems that tightly integrate physical, digital, and social systems. Specifically, the main contributions of the paper include: 1. **Innovatively introducing the 3D visual grounding task into parallel LiDAR systems**, endowing LiDAR systems with cognitive abilities through human-computer interaction. 2. **Proposing a large benchmark dataset named Talk2LiDAR**, specifically designed for 3D visual grounding in autonomous driving. 3. **Developing a two-stage baseline method and an efficient single-stage method (called BEVGrounding)**, the latter significantly improves localization accuracy by integrating coarse-grained sentence embeddings and fine-grained word embeddings with visual features. Through these contributions, the paper aims to advance the research on 3D visual grounding tasks in the field of autonomous driving, particularly in terms of dataset creation and method development, laying the foundation for future research.

Talk to Parallel LiDARs: A Human-LiDAR Interaction Method Based on 3D Visual Grounding

Deep Learning for LiDAR-only and LiDAR-fusion 3D Perception: a Survey

Language-Guided 3D Object Detection in Point Cloud for Autonomous Driving

Software-Defined Active LiDARs for Autonomous Driving: A Parallel Intelligence-Based Adaptive Model

Study of a Multi-Beam LiDAR Perception Assessment Model for Real-Time Autonomous Driving

Open 3D World in Autonomous Driving

LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding

3D Object Detection for Point Cloud in Virtual Driving Environment

On Deep Learning for Geometric and Semantic Scene Understanding Using On-Vehicle 3D LiDAR

GPT-4 Enhanced Multimodal Grounding for Autonomous Driving: Leveraging Cross-Modal Attention with Large Language Models

Delving Into the Devils of Bird's-Eye-View Perception: A Review, Evaluation and Recipe

LidaRefer: Outdoor 3D Visual Grounding for Autonomous Driving with Transformers

LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent

HPL-ViT: A Unified Perception Framework for Heterogeneous Parallel LiDARs in V2V

Is Your LiDAR Placement Optimized for 3D Scene Understanding?

Advancements in 3D Lane Detection Using LiDAR Point Clouds: From Data Collection to Model Development

Ground-Aware Point Cloud Semantic Segmentation for Autonomous Driving

3D Point Cloud Processing and Learning for Autonomous Driving

Influence of Camera-LiDAR Configuration on 3D Object Detection for Autonomous Driving

Ground-distance Segmentation of 3D LiDAR Point Cloud Toward Autonomous Driving