Scaling up the Evaluation of Collaborative Problem Solving: Promises and Challenges of Coding Chat Data with ChatGPT

Jiangang Hao,Wenju Cui,Patrick Kyllonen,Emily Kerzabi,Lei Liu,Michael Flor
2024-11-15
Abstract:Collaborative problem solving (CPS) is widely recognized as a critical 21st century skill. Efficiently coding communication data is a big challenge in scaling up research on assessing CPS. This paper reports the findings on using ChatGPT to directly code CPS chat data by benchmarking performance across multiple datasets and coding frameworks. We found that ChatGPT-based coding outperformed human coding in tasks where the discussions were characterized by colloquial languages but fell short in tasks where the discussions dealt with specialized scientific terminology and contexts. The findings offer practical guidelines for researchers to develop strategies for efficient and scalable analysis of communication data from CPS tasks.
Human-Computer Interaction,Computation and Language
What problem does this paper attempt to address?
The key problem that this paper attempts to solve is: How to use large - language models (LLMs) such as ChatGPT to automatically encode chat data in collaborative problem - solving (CPS) to improve the efficiency and scalability of research. Specifically, the paper explores the following aspects: 1. **Challenges in Efficiently Encoding Communication Data**: - Traditionally, analyzing communication data in CPS depends on manual encoding, which is a time - consuming and labor - intensive task. - With the increase in the amount of digital communication data, there is an urgent need for more efficient methods to automatically encode large amounts of data without reducing depth and quality. 2. **Possibilities of Using ChatGPT for Automatic Encoding**: - The paper evaluates ChatGPT's performance in directly encoding CPS chat data, especially its performance under different tasks and encoding frameworks. - Researchers benchmarked ChatGPT's encoding performance through multiple datasets and encoding frameworks to determine its effectiveness and limitations in practical applications. 3. **Specific Research Questions**: - **RQ1**: How accurate are various GPT models when encoding chat data in CPS tasks? - **RQ2**: How do task characteristics and communication styles affect the encoding performance of these models? - **RQ3**: What lessons can be learned through the prompting process to improve encoding performance? ### Main Findings - For tasks of non - technical discussions, such as general cognitive - skill tasks (negotiation, decision - making, problem - solving), ChatGPT's encoding performance is comparable to, or even better than, that of human coders. - For scientific tasks involving technical terms, ChatGPT performs poorly, especially when dealing with chat data containing a large number of scientific terms. - ChatGPT's advantage lies in handling conversations in everyday language, but it has deficiencies in dealing with professional terms and technical contexts. ### Conclusion This study shows that in some application scenarios, using ChatGPT to encode CPS chat data is feasible and reliable, and can significantly reduce time and cost. However, for tasks involving a large number of technical terms, caution is still required, and further optimization of prompt engineering or the use of more powerful LLMs should be considered. ### Formula Example The paper does not involve complex mathematical formulas, but to ensure the correct format, here is an example of a formula in Markdown: ```markdown For example, the formula for calculating the Kappa coefficient is: $$ \kappa = \frac{P_o - P_e}{1 - P_e} $$ where \( P_o \) is the observed proportion of agreement and \( P_e \) is the expected proportion of agreement. ``` Hopefully, this information can help you better understand the research objectives and main findings of this paper.