Abstract:Collaborative problem solving (CPS) is widely recognized as a critical 21st century skill. Efficiently coding communication data is a big challenge in scaling up research on assessing CPS. This paper reports the findings on using ChatGPT to directly code CPS chat data by benchmarking performance across multiple datasets and coding frameworks. We found that ChatGPT-based coding outperformed human coding in tasks where the discussions were characterized by colloquial languages but fell short in tasks where the discussions dealt with specialized scientific terminology and contexts. The findings offer practical guidelines for researchers to develop strategies for efficient and scalable analysis of communication data from CPS tasks.

What problem does this paper attempt to address?

The key problem that this paper attempts to solve is: How to use large - language models (LLMs) such as ChatGPT to automatically encode chat data in collaborative problem - solving (CPS) to improve the efficiency and scalability of research. Specifically, the paper explores the following aspects: 1. **Challenges in Efficiently Encoding Communication Data**: - Traditionally, analyzing communication data in CPS depends on manual encoding, which is a time - consuming and labor - intensive task. - With the increase in the amount of digital communication data, there is an urgent need for more efficient methods to automatically encode large amounts of data without reducing depth and quality. 2. **Possibilities of Using ChatGPT for Automatic Encoding**: - The paper evaluates ChatGPT's performance in directly encoding CPS chat data, especially its performance under different tasks and encoding frameworks. - Researchers benchmarked ChatGPT's encoding performance through multiple datasets and encoding frameworks to determine its effectiveness and limitations in practical applications. 3. **Specific Research Questions**: - **RQ1**: How accurate are various GPT models when encoding chat data in CPS tasks? - **RQ2**: How do task characteristics and communication styles affect the encoding performance of these models? - **RQ3**: What lessons can be learned through the prompting process to improve encoding performance? ### Main Findings - For tasks of non - technical discussions, such as general cognitive - skill tasks (negotiation, decision - making, problem - solving), ChatGPT's encoding performance is comparable to, or even better than, that of human coders. - For scientific tasks involving technical terms, ChatGPT performs poorly, especially when dealing with chat data containing a large number of scientific terms. - ChatGPT's advantage lies in handling conversations in everyday language, but it has deficiencies in dealing with professional terms and technical contexts. ### Conclusion This study shows that in some application scenarios, using ChatGPT to encode CPS chat data is feasible and reliable, and can significantly reduce time and cost. However, for tasks involving a large number of technical terms, caution is still required, and further optimization of prompt engineering or the use of more powerful LLMs should be considered. ### Formula Example The paper does not involve complex mathematical formulas, but to ensure the correct format, here is an example of a formula in Markdown: ```markdown For example, the formula for calculating the Kappa coefficient is: $$ \kappa = \frac{P_o - P_e}{1 - P_e} $$ where $ P_o $ is the observed proportion of agreement and $ P_e $ is the expected proportion of agreement. ``` Hopefully, this information can help you better understand the research objectives and main findings of this paper.

Scaling up the Evaluation of Collaborative Problem Solving: Promises and Challenges of Coding Chat Data with ChatGPT

Embrace Opportunities and Face Challenges: Using ChatGPT in Undergraduate Students' Collaborative Interdisciplinary Learning

ChatGPT, Can You Generate Solutions for my Coding Exercises? An Evaluation on its Effectiveness in an undergraduate Java Programming Course

Kattis vs. ChatGPT: Assessment and Evaluation of Programming Tasks in the Age of Artificial Intelligence

Exploring the Impact of ChatGPT on Student Interactions in Computer-Supported Collaborative Learning

A Closer Look at Different Difficulty Levels Code Generation Abilities of ChatGPT.

Assessing the Promise and Pitfalls of ChatGPT for Automated Code Generation

CPS-TaskForge: Generating Collaborative Problem Solving Environments for Diverse Communication Tasks

What factors will affect the effectiveness of using ChatGPT to solve programming problems? A quasi-experimental study

Self-collaboration Code Generation via ChatGPT

Application of Prompt Learning Models in Identifying the Collaborative Problem Solving Skills in an Online Task

Would ChatGPT-facilitated Programming Mode Impact College Students’ Programming Behaviors, Performances, and Perceptions? an Empirical Study

An empirical study on developers' shared conversations with ChatGPT in GitHub pull requests and issues

Evaluating ChatGPT-3.5 Efficiency in Solving Coding Problems of Different Complexity Levels: An Empirical Analysis

Extending the Frontier of ChatGPT: Code Generation and Debugging

Effectiveness of ChatGPT in Coding: A Comparative Analysis of Popular Large Language Models

Unmasking the giant: A comprehensive evaluation of ChatGPT's proficiency in coding algorithms and data structures

An Empirical Study on Developers Shared Conversations with ChatGPT in GitHub Pull Requests and Issues

ChatGPT for Programming Numerical Methods

Examining the Potential and Pitfalls of ChatGPT in Science and Engineering Problem-Solving

ChatGPT Chats Decoded: Uncovering Prompt Patterns for Superior Solutions in Software Development Lifecycle