BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation

Yunhao Ge,Yihe Tang,Jiashu Xu,Cem Gokmen,Chengshu Li,Wensi Ai,Benjamin Jose Martinez,Arman Aydin,Mona Anvari,Ayush K Chakravarthy,Hong-Xing Yu,Josiah Wong,Sanjana Srivastava,Sharon Lee,Shengxin Zha,Laurent Itti,Yunzhu Li,Roberto Martín-Martín,Miao Liu,Pengchuan Zhang,Ruohan Zhang,Li Fei-Fei,Jiajun Wu
2024-05-16
Abstract:The systematic evaluation and understanding of computer vision models under varying conditions require large amounts of data with comprehensive and customized labels, which real-world vision datasets rarely satisfy. While current synthetic data generators offer a promising alternative, particularly for embodied AI tasks, they often fall short for computer vision tasks due to low asset and rendering quality, limited diversity, and unrealistic physical properties. We introduce the BEHAVIOR Vision Suite (BVS), a set of tools and assets to generate fully customized synthetic data for systematic evaluation of computer vision models, based on the newly developed embodied AI benchmark, BEHAVIOR-1K. BVS supports a large number of adjustable parameters at the scene level (e.g., lighting, object placement), the object level (e.g., joint configuration, attributes such as "filled" and "folded"), and the camera level (e.g., field of view, focal length). Researchers can arbitrarily vary these parameters during data generation to perform controlled experiments. We showcase three example application scenarios: systematically evaluating the robustness of models across different continuous axes of domain shift, evaluating scene understanding models on the same set of images, and training and evaluating simulation-to-real transfer for a novel vision task: unary and binary state prediction. Project website:
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper introduces BEHAVIOR Vision Suite, a toolkit for generating customizable synthetic data to systematically evaluate and understand computer vision models. Currently, real-world datasets struggle to meet the comprehensive and customized labeling requirements under varying conditions, and existing synthetic data generators have limitations in terms of image quality, diversity, and physical realism. BEHAVIOR Vision Suite, based on the novel BEHAVIOR-1K benchmark, provides a large number of adjustable parameters for scene-level (such as lighting, object placement), object-level (such as joint configuration, "fill" and "fold" states), and camera-level (such as field of view, focal length) customization. Researchers can freely adjust these parameters to generate data for controlled experiments. The paper demonstrates three application examples: 1) robustness evaluation of models under different continuous domain transfer conditions; 2) evaluating scene understanding models using the same image set; 3) training and evaluating a novel visual task - simulation-to-real transfer of monocular and binocular prediction. The features of BEHAVIOR Vision Suite include high quality, physical plausibility, and high customization, providing rich annotations such as scene graphs, point clouds, depth, etc. It is applicable to a wide range of indoor scenes and objects, and supports physical interaction and modification of attribute states. By comparing with existing datasets, 3D reconstruction datasets, synthetic datasets, and 3D simulators, BEHAVIOR Vision Suite has advantages in customization and visual quality. It not only provides user-friendly tools for generating customized data but also addresses limitations of existing datasets, such as expensive annotation costs, static images, and fixed data distributions. The paper demonstrates the value of BEHAVIOR Vision Suite in model robustness evaluation, scene understanding, and training for new tasks through experiments.