Video-to-Text Pedestrian Monitoring (VTPM): Leveraging Computer Vision and Large Language Models for Privacy-Preserve Pedestrian Activity Monitoring at Intersections

Ahmed S. Abdelrahman,Mohamed Abdel-Aty,Dongdong Wang
2024-08-21
Abstract:Computer vision has advanced research methodologies, enhancing system services across various fields. It is a core component in traffic monitoring systems for improving road safety; however, these monitoring systems don't preserve the privacy of pedestrians who appear in the videos, potentially revealing their identities. Addressing this issue, our paper introduces Video-to-Text Pedestrian Monitoring (VTPM), which monitors pedestrian movements at intersections and generates real-time textual reports, including traffic signal and weather information. VTPM uses computer vision models for pedestrian detection and tracking, achieving a latency of 0.05 seconds per video frame. Additionally, it detects crossing violations with 90.2% accuracy by incorporating traffic signal data. The proposed framework is equipped with Phi-3 mini-4k to generate real-time textual reports of pedestrian activity while stating safety concerns like crossing violations, conflicts, and the impact of weather on their behavior with latency of 0.33 seconds. To enhance comprehensive analysis of the generated textual reports, Phi-3 medium is fine-tuned for historical analysis of these generated textual reports. This fine-tuning enables more reliable analysis about the pedestrian safety at intersections, effectively detecting patterns and safety critical events. The proposed VTPM offers a more efficient alternative to video footage by using textual reports reducing memory usage, saving up to 253 million percent, eliminating privacy issues, and enabling comprehensive interactive historical analysis.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The paper aims to address the following key issues: ### Core Issues - **Privacy Protection**: Existing traffic monitoring systems fail to effectively protect pedestrian privacy when monitoring pedestrian activities, potentially exposing pedestrian identity information in videos. - **Data Storage and Analysis**: The large amount of data generated by video surveillance leads to high storage costs and makes long-term historical data analysis difficult. ### Solution The paper proposes a new method called Video-to-Text Pedestrian Monitoring (VTPM), which combines computer vision technology and large language models (LLMs) to achieve the following goals: 1. **Privacy Protection**: By converting videos into text reports, it eliminates information in the video that may reveal personal identities, thereby protecting pedestrian privacy. 2. **Efficient Data Storage**: Text reports significantly reduce storage requirements compared to video files, saving storage space. 3. **Real-time Monitoring and Historical Analysis**: Utilizing fast large language models to generate real-time text reports and more powerful language models to conduct in-depth analysis of historical text reports to identify patterns, safety issues, etc. ### Technical Details - Use advanced computer vision technology to detect and track pedestrians and potential conflicts with vehicles. - Integrate weather information to assess the impact of environmental factors on pedestrian behavior. - Employ Microsoft's phi-3-mini model (with 3.8 billion parameters) for real-time text report generation. - Utilize Microsoft's phi-3-medium model (with 14 billion parameters) for historical data analysis. - Apply LoRA technology to fine-tune language models to improve performance in specific domains. ### Experimental Results - The system demonstrated low-latency processing capability (0.05 seconds/frame), supporting real-time video processing. - Report generation speed is 0.33 seconds per report. - Pedestrian violation detection accuracy reached 90.2%, capable of timely generating comprehensive reports on pedestrian activities. - Detailed analysis of pedestrian violations at different times, such as differences in behavior during day and night, and in sunny and rainy conditions. In summary, VTPM aims to improve the functionality and privacy protection capabilities of pedestrian monitoring systems through an innovative approach, while reducing data storage requirements and providing effective means for historical data analysis.