Abstract:Computer vision has advanced research methodologies, enhancing system services across various fields. It is a core component in traffic monitoring systems for improving road safety; however, these monitoring systems don't preserve the privacy of pedestrians who appear in the videos, potentially revealing their identities. Addressing this issue, our paper introduces Video-to-Text Pedestrian Monitoring (VTPM), which monitors pedestrian movements at intersections and generates real-time textual reports, including traffic signal and weather information. VTPM uses computer vision models for pedestrian detection and tracking, achieving a latency of 0.05 seconds per video frame. Additionally, it detects crossing violations with 90.2% accuracy by incorporating traffic signal data. The proposed framework is equipped with Phi-3 mini-4k to generate real-time textual reports of pedestrian activity while stating safety concerns like crossing violations, conflicts, and the impact of weather on their behavior with latency of 0.33 seconds. To enhance comprehensive analysis of the generated textual reports, Phi-3 medium is fine-tuned for historical analysis of these generated textual reports. This fine-tuning enables more reliable analysis about the pedestrian safety at intersections, effectively detecting patterns and safety critical events. The proposed VTPM offers a more efficient alternative to video footage by using textual reports reducing memory usage, saving up to 253 million percent, eliminating privacy issues, and enabling comprehensive interactive historical analysis.

What problem does this paper attempt to address?

The paper aims to address the following key issues: ### Core Issues - **Privacy Protection**: Existing traffic monitoring systems fail to effectively protect pedestrian privacy when monitoring pedestrian activities, potentially exposing pedestrian identity information in videos. - **Data Storage and Analysis**: The large amount of data generated by video surveillance leads to high storage costs and makes long-term historical data analysis difficult. ### Solution The paper proposes a new method called Video-to-Text Pedestrian Monitoring (VTPM), which combines computer vision technology and large language models (LLMs) to achieve the following goals: 1. **Privacy Protection**: By converting videos into text reports, it eliminates information in the video that may reveal personal identities, thereby protecting pedestrian privacy. 2. **Efficient Data Storage**: Text reports significantly reduce storage requirements compared to video files, saving storage space. 3. **Real-time Monitoring and Historical Analysis**: Utilizing fast large language models to generate real-time text reports and more powerful language models to conduct in-depth analysis of historical text reports to identify patterns, safety issues, etc. ### Technical Details - Use advanced computer vision technology to detect and track pedestrians and potential conflicts with vehicles. - Integrate weather information to assess the impact of environmental factors on pedestrian behavior. - Employ Microsoft's phi-3-mini model (with 3.8 billion parameters) for real-time text report generation. - Utilize Microsoft's phi-3-medium model (with 14 billion parameters) for historical data analysis. - Apply LoRA technology to fine-tune language models to improve performance in specific domains. ### Experimental Results - The system demonstrated low-latency processing capability (0.05 seconds/frame), supporting real-time video processing. - Report generation speed is 0.33 seconds per report. - Pedestrian violation detection accuracy reached 90.2%, capable of timely generating comprehensive reports on pedestrian activities. - Detailed analysis of pedestrian violations at different times, such as differences in behavior during day and night, and in sunny and rainy conditions. In summary, VTPM aims to improve the functionality and privacy protection capabilities of pedestrian monitoring systems through an innovative approach, while reducing data storage requirements and providing effective means for historical data analysis.

Video-to-Text Pedestrian Monitoring (VTPM): Leveraging Computer Vision and Large Language Models for Privacy-Preserve Pedestrian Activity Monitoring at Intersections

A Novel Approach to Design the Fast Pedestrian Detection for Video Surveillance System

&Lt;title>automatic Traffic Real-Time Analysis System Based on Video</title>

Real-time Pedestrian Crossing Lights Detection Algorithm for the Visually Impaired

Fast Pedestrian Detection And Tracking Based On Vibe Combined Hog-Svm Scheme

Recognition and Co-Analysis of Pedestrian Activities in Different Parts of Road using Traffic Camera Video

Pedestrian movement trajectory reappearance and crossing feature expression based on video processing

Representation and Analysis of Pedestrian Crossing States Based on Video Tracking

Dynamic Video Streaming Real-time Headcount Based on Improved YOLO V5 for Open Plazas

GPT-4V Takes the Wheel: Promises and Challenges for Pedestrian Behavior Prediction

Pedestrian Safety Analysis in Mixed Traffic Conditions Using Video Data

Vision-Based Potential Pedestrian Risk Analysis on Unsignalized Crosswalk Using Data Mining Techniques

A Real-time Evaluation Framework for Pedestrian's Potential Risk at Non-Signalized Intersections Based on Predicted Post-Encroachment Time

REVAMP$^2$T: Real-time Edge Video Analytics for Multi-camera Privacy-aware Pedestrian Tracking

Research on Risk Analysis of Pedestrian and Vehicle Flow in Community Scenes Based on Machine Vision

A context-aware pedestrian trajectory prediction framework for automated vehicles

WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding

Tannic acid affects the phenotype of Staphylococcus aureus resistant to tetracycline and erythromycin by inhibition of efflux pumps.

A Real-Time Predictive Pedestrian Collision Warning Service for Cooperative Intelligent Transportation Systems Using 3D Pose Estimation

A Novel Method for Tracking Pedestrians from Real-Time Video.

Action-ViT: Pedestrian Intent Prediction in Traffic Scenes