Abstract:Traffic anomaly detection (TAD) in driving videos is critical for ensuring the safety of autonomous driving and advanced driver assistance systems. Previous single-stage TAD methods primarily rely on frame prediction, making them vulnerable to interference from dynamic backgrounds induced by the rapid movement of the dashboard camera. While two-stage TAD methods appear to be a natural solution to mitigate such interference by pre-extracting background-independent features (such as bounding boxes and optical flow) using perceptual algorithms, they are susceptible to the performance of first-stage perceptual algorithms and may result in error propagation. In this paper, we introduce TTHF, a novel single-stage method aligning video clips with text prompts, offering a new perspective on traffic anomaly detection. Unlike previous approaches, the supervised signal of our method is derived from languages rather than orthogonal one-hot vectors, providing a more comprehensive representation. Further, concerning visual representation, we propose to model the high frequency of driving videos in the temporal domain. This modeling captures the dynamic changes of driving scenes, enhances the perception of driving behavior, and significantly improves the detection of traffic anomalies. In addition, to better perceive various types of traffic anomalies, we carefully design an attentive anomaly focusing mechanism that visually and linguistically guides the model to adaptively focus on the visual context of interest, thereby facilitating the detection of traffic anomalies. It is shown that our proposed TTHF achieves promising performance, outperforming state-of-the-art competitors by +5.4% AUC on the DoTA dataset and achieving high generalization on the DADA dataset.

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to address the issue of Traffic Anomaly Detection (TAD) in autonomous driving and advanced driver assistance systems. Specifically, the paper focuses on detecting traffic anomalies such as vehicle collisions and loss of control from a first-person driving perspective. Existing traffic anomaly detection methods are mainly divided into single-stage and two-stage paradigms: 1. **Single-Stage Paradigm**: These methods primarily rely on frame prediction, but due to dynamic background interference caused by the rapid movement of dashboard cameras, they perform poorly in detecting traffic anomalies. 2. **Two-Stage Paradigm**: These methods first extract background-independent features (such as bounding boxes, optical flow, etc.) and then perform traffic anomaly detection. However, these methods are highly sensitive to the performance of the first-stage perception algorithms, which can lead to error propagation and affect detection results. To address these issues, the paper proposes a new single-stage method—Text-Driven Traffic Anomaly Detection with Temporal High-Frequency Modeling (TTHF). This method improves existing traffic anomaly detection techniques through the following innovations: 1. **Text-Driven**: Unlike traditional methods, TTHF uses natural language as a supervisory signal instead of orthogonal one-hot vectors, providing a more comprehensive representation. 2. **Temporal High-Frequency Modeling**: TTHF emphasizes modeling high-frequency information in the temporal dimension to capture dynamic changes in driving scenes and enhance the perception of driving behavior. 3. **Attention Mechanism**: To better perceive various types of traffic anomalies, TTHF designs an attention mechanism that guides the model to adaptively focus on the visual context of interest. Through these innovations, TTHF outperforms existing methods on multiple datasets and demonstrates high generalization capability.

Text-Driven Traffic Anomaly Detection with Temporal High-Frequency Modeling in Driving Videos

Anomaly Candidate Identification and Starting Time Estimation of Vehicles from Traffic Videos.

DoTA: Unsupervised Detection of Traffic Anomaly in Driving Videos

&Lt;title>automatic Traffic Real-Time Analysis System Based on Video</title>

From Time to Space: Automatic Annotation of Unmarked Traffic Scene Based on Trajectory Data.

Progressive Temporal-Spatial-Semantic Analysis of Driving Anomaly Detection and Recounting

When, Where, and What? A New Dataset for Anomaly Detection in Driving Videos

A Memory-Augmented Multi-Task Collaborative Framework for Unsupervised Traffic Accident Detection in Driving Videos

Anomaly Detection in Traffic Surveillance Videos with GAN-based Future Frame Prediction

DiffTAD: Denoising diffusion probabilistic models for vehicle trajectory anomaly detection

Anomaly Detection in Traffic Scenes via Spatial-aware Motion Reconstruction

Challenges in Time-Stamp Aware Anomaly Detection in Traffic Videos

DSTANet: learning a dual-stream model for anomaly driving action detection using spatio-temporal and appearance features

Good Practices and A Strong Baseline for Traffic Anomaly Detection

Unsupervised Traffic Accident Detection in First-Person Videos

Anomalous Motion Detection on Highway Using Deep Learning

AAD: Adaptive Anomaly Detection through traffic surveillance videos

Improved Dynamic Spatial-Temporal Attention Network for Early Anticipation of Traffic Accidents

Configurable Spatial-Temporal Hierarchical Analysis for Flexible Video Anomaly Detection

Dual-Modality Vehicle Anomaly Detection via Bilateral Trajectory Tracing

TAD: A Large-Scale Benchmark for Traffic Accidents Detection from Video Surveillance