CodeMind: A Framework to Challenge Large Language Models for Code Reasoning

Changshu Liu,Shizhuo Dylan Zhang,Ali Reza Ibrahimzada,Reyhaneh Jabbarvand
2024-04-03
Abstract:Solely relying on test passing to evaluate Large Language Models (LLMs) for code synthesis may result in unfair assessment or promoting models with data leakage. As an alternative, we introduce CodeMind, a framework designed to gauge the code reasoning abilities of LLMs. CodeMind currently supports three code reasoning tasks: Independent Execution Reasoning (IER), Dependent Execution Reasoning (DER), and Specification Reasoning (SR). The first two evaluate models to predict the execution output of an arbitrary code or code the model could correctly synthesize. The third one evaluates the extent to which LLMs implement the specified expected behavior.
Software Engineering,Artificial Intelligence,Computation and Language,Programming Languages
What problem does this paper attempt to address?