Abstract:Large-scale, high-quality dataset was the foundation of developing advanced artificial intelligence applications. However, creating such a benchmark dataset in a professional field, such as precision management of animals, was always a challenge because of the costly and labor-intensive process of annotation and review. This study introduced a novel workflow named Accelerated Data Engine (ADE), designed to efficiently produce representative and high-quality computer vision datasets from raw animal surveillance footage. By incorporating referring and grounding models (R&G models) as auto-annotators, along with a distillation mechanism for dataset-auditors, ADE significantly speeded up the dataset construction process. The new workflow received natural language inputs as referrals to identify animal instances, delineated their body shapes, and then refined the auto-annotated data through a selection process. To demonstrate the efficacy of ADE, three 30-minute surveillance video samples featuring pigs, sheep, and cattle were discussed in this study. The results indicated the R&G models effectively annotated animals across various farms, while distillation mechanisms could identify various detection errors, balance the data representations, refine annotations, and verify the data quality. Two high-quality cattle datasets (6.5 k and 486 frames), including 26 k and 2.5 k cattle instances, were generated through the ADE workflow from 24-hour surveillance videos on a commercial cattle farm and made publicly available. The proposed dataset has achievable performance between 74.6 %similar to 84.1 %. The ADE workflow saved 78.4 % of manual work compared to the traditional dataset construction workflow (approximately 141 h). This pioneering approach empowered the fast creation of benchmark animal datasets and would enhance computer vision applications in the livestock production industry in the future.

PEANUT: A Human-AI Collaborative Tool for Annotating Audio-Visual Data

A Software for Rapid Annotation of Scene Objects Based on Saliency Object Ranking

Annotation-free Audio-Visual Segmentation

Each Perform Its Functions: Task Decomposition and Feature Assignment for Audio-Visual Segmentation

Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

PEAVS: Perceptual Evaluation of Audio-Visual Synchrony Grounded in Viewers' Opinion Scores

Audio-Visual Segmentation

OpenAnnotate2: Multi-Modal Auto-Annotating for Autonomous Driving

Vision-Infused Deep Audio Inpainting

APES: Audiovisual Person Search in Untrimmed Video

PANDA: A Gigapixel-level Human-centric Video Dataset

Learning to Answer Questions in Dynamic Audio-Visual Scenarios

Audio-Visual Segmentation with Semantics

Connecting the Dots between Audio and Text without Parallel Data through Visual Knowledge Transfer

VideoPro: A Visual Analytics Approach for Interactive Video Programming

UniAV: Unified Audio-Visual Perception for Multi-Task Video Localization.

Accelerated Data Engine: A Faster Dataset Construction Workflow for Computer Vision Applications in Commercial Livestock Farms

Visual Analytics for Efficient Image Exploration and User-Guided Image Captioning

A perceptual manipulation system for audio-visual fusion of robots

AutoAD III: The Prequel -- Back to the Pixels