DataAgent: Evaluating Large Language Models' Ability to Answer Zero-Shot, Natural Language Queries
Manit Mishra,Abderrahman Braham,Charles Marsom,Bryan Chung,Gavin Griffin,Dakshesh Sidnerlikar,Chatanya Sarin,Arjun Rajaram
DOI: https://doi.org/10.1109/ICAIC60265.2024.10433803
2024-03-30
Abstract:Conventional processes for analyzing datasets and extracting meaningful information are often time-consuming and laborious. Previous work has identified manual, repetitive coding and data collection as major obstacles that hinder data scientists from undertaking more nuanced labor and high-level projects. To combat this, we evaluated OpenAI's GPT-3.5 as a "Language Data Scientist" (LDS) that can extrapolate key findings, including correlations and basic information, from a given dataset. The model was tested on a diverse set of benchmark datasets to evaluate its performance across multiple standards, including data science code-generation based tasks involving libraries such as NumPy, Pandas, Scikit-Learn, and TensorFlow, and was broadly successful in correctly answering a given data science query related to the benchmark dataset. The LDS used various novel prompt engineering techniques to effectively answer a given question, including Chain-of-Thought reinforcement and SayCan prompt engineering. Our findings demonstrate great potential for leveraging Large Language Models for low-level, zero-shot data analysis.
Computation and Language,Artificial Intelligence