Abstract:Data analysis is challenging as it requires synthesizing domain knowledge, statistical expertise, and programming skills. Assistants powered by large language models (LLMs), such as ChatGPT, can assist analysts by translating natural language instructions into code. However, AI-assistant responses and analysis code can be misaligned with the analyst's intent or be seemingly correct but lead to incorrect conclusions. Therefore, validating AI assistance is crucial and challenging. Here, we explore how analysts understand and verify the correctness of AI-generated analyses. To observe analysts in diverse verification approaches, we develop a design probe equipped with natural language explanations, code, visualizations, and interactive data tables with common data operations. Through a qualitative user study (n=22) using this probe, we uncover common behaviors within verification workflows and how analysts' programming, analysis, and tool backgrounds reflect these behaviors. Additionally, we provide recommendations for analysts and highlight opportunities for designers to improve future AI-assistant experiences.

What problem does this paper attempt to address?

The core problem that this paper attempts to solve is: how to understand and verify the correctness of data analysis results assisted by AI assistants. With the development of large - language models (LLMs), AI assistants can convert natural - language instructions into code, thus helping data analysts execute and automate their data analysis tasks. However, the answers of AI assistants and the generated analysis code may be inconsistent with the analysts' intentions, or seem correct but actually lead to wrong conclusions. Therefore, verifying the correctness and reliability of AI - assisted analysis has become crucial and challenging. Specifically, the paper explores this problem through the following aspects: 1. **Research Background**: - Data analysis is a complex task that requires the combination of domain knowledge, statistical expertise, and programming skills. - AI assistants such as ChatGPT can simplify the data analysis process through natural - language processing, but their outputs may have misunderstandings or errors. 2. **Research Objectives**: - Explore how analysts understand and verify the analysis results generated by AI. - Through qualitative user research (n = 22), observe the behavior patterns of analysts with different backgrounds when verifying AI - assisted analysis. - Provide improvement suggestions to enhance the design of future AI assistants, enabling analysts to more effectively evaluate the analysis results generated by AI. 3. **Research Methods**: - Develop a design probe, including natural - language explanations, code, visualizations, and interactive data tables, to support different verification needs of analysts. - Through qualitative research methods, observe the specific behaviors of analysts when verifying AI - generated analysis, and analyze the relationship between these behaviors and the analysts' backgrounds. 4. **Main Findings**: - Analysts usually start with program - oriented behaviors ("What did the AI do?"), and then turn to data - oriented behaviors ("Does the result data make sense?"). - Data artifacts (such as data tables and summary visualizations) and program artifacts (such as natural - language explanations and code comments) complement each other in the verification process and jointly help analysts understand the AI's analysis process. 5. **Contributions**: - Reveal the common behavior patterns of analysts when verifying AI - generated analysis. - Propose improvement suggestions for end - user analysts and tool developers, aiming to improve the reliability and verifiability of AI - assisted data analysis. In conclusion, through in - depth research on how analysts understand and verify the results of AI - assisted data analysis, this paper reveals the challenges of current AI assistants in practical applications and provides valuable insights and suggestions for the design of future AI assistants.

How Do Analysts Understand and Verify AI-Assisted Data Analyses?

Improving Steering and Verification in AI-Assisted Data Analysis with Interactive Task Decomposition

How Do Data Analysts Respond to AI Assistance? A Wizard-of-Oz Study

Programming with AI: Evaluating ChatGPT, Gemini, AlphaCode, and GitHub Copilot for Programmers

Data Analysis in the Era of Generative AI

Understanding the Usability of AI Programming Assistants

"It's like a rubber duck that talks back": Understanding Generative AI-Assisted Data Analysis Workflows through a Participatory Prompting Study

Beyond Static Evaluation: A Dynamic Approach to Assessing AI Assistants' API Invocation Capabilities

Assessing AI Detectors in Identifying AI-Generated Code: Implications for Education

Let's Ask AI About Their Programs: Exploring ChatGPT's Answers To Program Comprehension Questions

Human-AI Collaboration in Thematic Analysis using ChatGPT: A User Study and Design Recommendations

Exploring the Role of AI Assistants in Computer Science Education: Methods, Implications, and Instructor Perspectives

Unpacking Help-Seeking Process Through Multimodal Learning Analytics: A Comparative Study of ChatGPT Vs Human Expert

Using AI-Based Coding Assistants in Practice: State of Affairs, Perceptions, and Ways Forward

Can Large Language Models Serve as Data Analysts? A Multi-Agent Assisted Approach for Qualitative Data Analysis

Harnessing AI for efficient analysis of complex policy documents: a case study of Executive Order 14110

A Large-Scale Survey on the Usability of AI Programming Assistants: Successes and Challenges

WaitGPT: Monitoring and Steering Conversational LLM Agent in Data Analysis with On-the-Fly Code Visualization

The Accuracy of Domain Specific and Descriptive Analysis Generated by Large Language Models

Is GPT-4 a Good Data Analyst?

Co-audit: tools to help humans double-check AI-generated content