Multimodal Data Curation via Object Detection and Filter Ensembles

Tzu-Heng Huang,Changho Shin,Sui Jiet Tay,Dyah Adila,Frederic Sala
2024-01-05
Abstract:We propose an approach for curating multimodal data that we used for our entry in the 2023 DataComp competition filtering track. Our technique combines object detection and weak supervision-based ensembling. In the first of two steps in our approach, we employ an out-of-the-box zero-shot object detection model to extract granular information and produce a variety of filter designs. In the second step, we employ weak supervision to ensemble filtering rules. This approach results in a 4% performance improvement when compared to the best-performing baseline, producing the top-ranking position in the small scale track at the time of writing. Furthermore, in the medium scale track, we achieve a noteworthy 4.2% improvement over the baseline by simply ensembling existing baselines with weak supervision.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?