Abstract:Video analytics is widely used in contemporary systems and services. At the forefront of video analytics are video queries that users develop to find objects of particular interest. Building upon the insight that video objects (e.g., human, animals, cars, etc.), the center of video analytics, are similar in spirit to objects modeled by traditional object-oriented languages, we propose to develop an object-oriented approach to video analytics. This approach, named VQPy, consists of a frontend$\unicode{x2015}$a Python variant with constructs that make it easy for users to express video objects and their interactions$\unicode{x2015}$as well as an extensible backend that can automatically construct and optimize pipelines based on video objects. We have implemented and open-sourced VQPy, which has been productized in Cisco as part of its DeepVision framework.

What problem does this paper attempt to address?

The paper proposes a solution to the problem of complex queries in modern video analysis. With the popularity of surveillance cameras and online video platforms, a large amount of video data emerges, and intelligent analysis of these data is essential for various applications such as enhanced security in smart cities, optimized traffic management, and autonomous driving. Video querying is the core of video analysis, where users search for objects or events of specific interests, such as traffic violations or suspicious activities. The current methods for handling video queries mainly include manually constructing pipelines, using SQL-like languages, and multi-modal language models. These methods have their limitations, such as the high labor intensity and error-prone nature of manually constructing pipelines, SQL-like frameworks not being adept at handling object-based video queries, and multi-modal language models being unsuitable for real-time or time-sensitive applications. The main observation of the paper is that video queries focus on objects in the videos (such as people, animals, vehicles, etc.) and their spatial and temporal interactions, and existing technologies lack object-based abstractions, making it difficult to describe and optimize complex queries. The key insight of the paper is that video objects share essential similarities with objects in traditional object-oriented programming languages. Therefore, creating a query framework based on video objects can simplify the writing of complex queries and allow optimizations at the object level to improve query performance. To this end, the paper introduces VQPy, a variant of Python, designed specifically for expressing video objects and their interactions in a user-friendly manner. The VQPy front-end uses an object-oriented approach to represent video objects and interactions, while the back-end is based on an object-centric data model that automatically constructs and optimizes pipelines. Furthermore, VQPy provides an extensible optimization framework that can easily integrate various query optimizations. The paper demonstrates the effectiveness of VQPy through evaluations of 14 queries and 5 real-world surveillance video stream datasets, showing an average speed improvement of 10 times compared to existing systems without sacrificing accuracy. VQPy is open-source and has been productized in Cisco's DeepVision framework.

VQPy: An Object-Oriented Approach to Modern Video Analytics

Quda: Natural Language Queries for Visual Data Analytics

Video Question Answering With Prior Knowledge and Object-Sensitive Learning

Spatialyze: A Geospatial Video Analytics System with Spatial-Aware Optimizations

Video Monitoring Queries

Intelligent Analysis Oriented Surveillance Video Coding.

A Large-scale Distributed Video Parsing and Evaluation Platform

SB-VQA: A Stack-Based Video Quality Assessment Framework for Video Enhancement

Hierarchical Object-oriented Spatio-Temporal Reasoning for Video Question Answering

I-ViSE: Interactive Video Surveillance as an Edge Service using Unsupervised Feature Queries

A Unified Framework for Human-centric Point Cloud Video Understanding

Single-Stage Visual Query Localization in Egocentric Videos

DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries

Video Question Answering: Datasets, Algorithms and Challenges

Query-aware Long Video Localization and Relation Discrimination for Deep Video Understanding

OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer

KVQ: Kwai Video Quality Assessment for Short-form Videos

VideoPro: A Visual Analytics Approach for Interactive Video Programming

AxiomVision: Accuracy-Guaranteed Adaptive Visual Model Selection for Perspective-Aware Video Analytics