Probing Mechanical Reasoning in Large Vision Language Models

Haoran Sun,Qingying Gao,Haiyun Lyu,Dezhi Luo,Hokin Deng,Yijiang Li
2024-10-01
Abstract:Mechanical reasoning is a fundamental ability that sets human intelligence apart from other animal intelligence. Mechanical reasoning allows us to design tools, build bridges and canals, and construct houses which set the foundation of human civilization. Embedding machines with such ability is an important step towards building human-level artificial intelligence. Recently, Li et al. built CogDevelop2K, a data-intensive cognitive experiment benchmark for assaying the developmental trajectory of machine intelligence (Li et al., 2024). Here, to investigate mechanical reasoning in Vision Language Models, we leverage the MechBench of CogDevelop2K, which contains approximately 150 cognitive experiments, to test understanding of mechanical system stability, gears and pulley systems, seesaw-like systems and leverage principle, inertia and motion, and other fluid-related systems in Large Vision Language Models. We observe diverse yet consistent behaviors over these aspects in VLMs.
Artificial Intelligence,Neurons and Cognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to evaluate the capabilities of large - scale Vision Language Models (VLMs) in mechanical reasoning. Specifically, the author utilizes MechBench in CogDevelop2K, which is a data set containing approximately 150 cognitive experiments, to test the VLMs' comprehension abilities in the following aspects: 1. **Mechanical system stability**: For example, determining which objects are more likely to tip over or remain stable. 2. **Pulley systems**: For example, determining which pulley system requires the least effort to lift a heavy object. 3. **Gear systems**: For example, determining how the rotation direction of one gear affects the rotation direction of another gear. 4. **Seesaw systems and the principle of the lever**: For example, determining how to adjust the position to balance the seesaw. 5. **Inertia and motion**: For example, determining the motion state of an object under different conditions. 6. **Fluid mechanics**: For example, determining the behavior of a fluid system. Through these experiments, the author hopes to understand the performance of current VLMs in handling these mechanical reasoning tasks and the differences between their performance and that of humans in these tasks. This helps to reveal the advantages and limitations of VLMs in mechanical reasoning, thereby providing a basis for further improving these models.