R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models

Linger Deng,Yuliang Liu,Bohan Li,Dongliang Luo,Liang Wu,Chengquan Zhang,Pengyuan Lyu,Ziyang Zhang,Gang Zhang,Errui Ding,Yingying Zhu,Xiang Bai
2024-10-27
Abstract:Existing Large Multimodal Models (LMMs) struggle with mathematical geometric reasoning due to a lack of high-quality image-text paired data. Current geometric data generation approaches, which apply preset templates to generate geometric data or use Large Language Models (LLMs) to rephrase questions and answers (Q&A), unavoidably limit data accuracy and diversity. To synthesize higher-quality data, we propose a two-stage Reverse Chain-of-Thought (R-CoT) geometry problem generation pipeline. First, we introduce GeoChain to produce high-fidelity geometric images and corresponding descriptions highlighting relations among geometric elements. We then design a Reverse A&Q method that reasons step-by-step based on the descriptions and generates questions in reverse from the reasoning results. Experiments demonstrate that the proposed method brings significant and consistent improvements on multiple LMM baselines, achieving new performance records in the 2B, 7B, and 8B settings. Notably, R-CoT-8B significantly outperforms previous state-of-the-art open-source mathematical models by 16.6% on MathVista and 9.2% on GeoQA, while also surpassing the closed-source model GPT-4o by an average of 13% across both datasets. The code is available at <a class="link-external link-https" href="https://github.com/dle666/R-CoT" rel="external noopener nofollow">this https URL</a>.
Artificial Intelligence,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address the poor performance of existing large-scale multimodal models (LMMs) in mathematical geometric reasoning. Specifically, current LMMs struggle with geometric problems that require visual understanding due to the lack of high-quality image-text paired data. Existing methods for generating geometric data, such as using preset templates to generate geometric data or rephrasing questions and answers (Q&A) using large language models (LLMs), inevitably limit the accuracy and diversity of the data. To generate higher quality data, the authors propose a two-stage Reverse Chain-of-Thought (R-CoT) geometric problem generation pipeline. This pipeline first generates high-fidelity geometric images and their corresponding descriptions through GeoChain, and then uses a reverse A&Q method to step-by-step reason and generate problems based on these descriptions. Experimental results show that this method brings significant and consistent performance improvements on multiple LMM baselines and sets new performance records under 2B, 7B, and 8B parameter settings. Specifically, R-CoT-8B outperforms the previous state-of-the-art open-source mathematical models by 16.6% and 9.2% on the MathVista and GeoQA datasets, respectively, and also exceeds the average performance of the closed-source model GPT-4o on both datasets by an average of 13%. Additionally, R-CoT ensures stability during the training process and high fidelity of the data.