Language-Enhanced Latent Representations for Out-of-Distribution Detection in Autonomous Driving

Zhenjiang Mao,Dong-You Jhong,Ao Wang,Ivan Ruchkin

2024-05-03

Abstract:Out-of-distribution (OOD) detection is essential in autonomous driving, to determine when learning-based components encounter unexpected inputs. Traditional detectors typically use encoder models with fixed settings, thus lacking effective human interaction capabilities. With the rise of large foundation models, multimodal inputs offer the possibility of taking human language as a latent representation, thus enabling language-defined OOD detection. In this paper, we use the cosine similarity of image and text representations encoded by the multimodal model CLIP as a new representation to improve the transparency and controllability of latent encodings used for visual anomaly detection. We compare our approach with existing pre-trained encoders that can only produce latent representations that are meaningless from the user's standpoint. Our experiments on realistic driving data show that the language-based latent representation performs better than the traditional representation of the vision encoder and helps improve the detection performance when combined with standard representations.

Computer Vision and Pattern Recognition,Machine Learning,Robotics

What problem does this paper attempt to address?

This paper aims to solve a key problem in the field of autonomous driving: how to effectively detect out - of - distribution (OOD) data. Specifically, traditional OOD detection methods usually use encoder models with fixed settings. These models lack the ability to interact effectively with humans and cannot be adjusted according to the specific needs of users. With the rise of large - scale foundation models, multi - modal input provides the possibility of using human language as a latent representation, thereby achieving language - defined OOD detection. This paper proposes a new method to improve the transparency and controllability of latent encodings for visual anomaly detection by calculating the cosine similarity between image and text representations. This method not only improves the detection performance but also enhances the user's trust and control over the system. The main contributions of the paper include: 1. Proposing a new language - guided OOD detection technique, enabling end - users to obtain more transparency and control. 2. Conducting extensive experiments on photo - realistic simulation data to evaluate the performance of different language encodings in OOD detection. Through this method, users can specify the phenomena they care about. For example, a driver can specify that the vehicle is expected to see a clear, bright and open road, and any deviation from this standard should be regarded as an OOD input. This ability greatly improves the flexibility and transparency of anomaly detection, which is especially important from the perspective of end - users.

Language-Enhanced Latent Representations for Out-of-Distribution Detection in Autonomous Driving

TagOOD: A Novel Approach to Out-of-Distribution Detection via Vision-Language Representations and Class Center Learning

Exploring Large Language Models for Multi-Modal Out-of-Distribution Detection

Monitoring Perception Reliability in Autonomous Driving: Distributional Shift Detection for Estimating the Impact of Input Data on Prediction Accuracy

Generating Out-Of-Distribution Scenarios Using Language Models

Delving into Out-of-Distribution Detection with Vision-Language Representations

A Multimodal Data-Driven Approach for Driving Risk Assessment

VI-OOD: A Unified Representation Learning Framework for Textual Out-of-distribution Detection

Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection

Out-of-Distribution Detection for LiDAR-based 3D Object Detection

Empowering Corner Case Detection in Autonomous Vehicles with Multimodal Large Language Models

Integrating Object Detection Modality into Visual Language Model for Enhanced Autonomous Driving Agent

Towards Explainable, Safe Autonomous Driving with Language Embeddings for Novelty Identification and Active Learning: Framework and Experimental Analysis with Real-World Data Sets

Out-of-Distribution Detection Using Peer-Class Generated by Large Language Model

A Unified Approach to Semi-Supervised Out-of-Distribution Detection

Out-of-Distribution Detection in Multi-Label Datasets using Latent Space of $β$-VAE

Continual Unsupervised Out-of-Distribution Detection

Improving Variational Autoencoder based Out-of-Distribution Detection for Embedded Real-time Applications

Revisiting Out-of-Distribution Detection in LiDAR-based 3D Object Detection

General-Purpose Multi-Modal OOD Detection Framework

Out-of-domain Detection for Natural Language Understanding in Dialog Systems