MultiPoT: Multilingual Program of Thoughts Harnesses Multiple Programming Languages

Xianzhen Luo,Qiang Zhu,Zhiming Zhang,Libo Qin,Qianqian Wang,Qing Yang,Dong Xu,Wanxiang Che
DOI: https://doi.org/10.48550/arxiv.2402.10691
2024-01-01
Abstract:Program of Thoughts (PoT) is an approach characterized by its executable intermediate steps, which ensure the accuracy of the numerical calculations in the reasoning process. Currently, PoT primarily uses Python. However, relying solely on a single language may result in suboptimal solutions and overlook the potential benefits of other programming languages. In this paper, we conduct comprehensive experiments on the programming languages used in PoT and find that no single language consistently delivers optimal performance across all tasks and models. The effectiveness of each language varies depending on the specific scenarios. Inspired by this, we propose a task and model agnostic approach called MultiPoT, which harnesses strength and diversity from various languages. Experimental results reveal that it significantly outperforms Python Self-Consistency. Furthermore, it achieves comparable or superior performance compared to the best monolingual PoT in almost all tasks across all models. In particular, MultiPoT achieves more than 4.6\% improvement on average on both Starcoder and ChatGPT (gpt-3.5-turbo).
What problem does this paper attempt to address?