False Positives and Deceptive Errors in SQL Assessment: A Large-scale Analysis of Online Judge Systems

Jinshui Wang,Shuguang Chen,Zhengyi Tang,Pengchen Lin,Yupeng Wang
DOI: https://doi.org/10.1145/3654677
2024-03-28
ACM Transactions on Computing Education
Abstract:Online Judge Systems (OJSs) play a crucial role in evaluating SQL programming skills. However, OJSs may not accurately evaluate students’ queries due to the error-detection capabilities of test sets are insufficient, resulting in false positives that can mislead students and hinder their learning. This study analyzes a large-scale OJS’s evaluation dataset and identifies more than 110,000 (1.94%) false positive queries. It also validates existing SQL error categorization and reveals a new type of logical error called deceptive error, which occurs when students construct queries that pass specific test cases but fail to solve the actual problem. This type of error has been overlooked in previous research and can provide new insights into how to improve OJSs by enhancing test cases and feedback. This study contributes to the understanding of assessment and evaluation practices and processes in programming education, particularly the contribution that OJSs make to student learning and to course, staff and institutional development. It also suggests error prevention and detection techniques that can improve the effectiveness and fairness of OJSs in programming education and competitions.
education, scientific disciplines
What problem does this paper attempt to address?
This paper attempts to solve the misjudgment problems that occur when online judging systems (OJSs) evaluate SQL queries, especially false positives and deceptive errors. These problems may lead to the situation where the queries submitted by students pass the test cases but do not actually solve the problems correctly, thus misleading students and affecting their learning outcomes. Specifically, the paper focuses on the following two research questions: 1. **RQ1**: What kinds of errors exist in SQL queries that are wrongly considered correct by automatic evaluation systems (such as OJSs), and what are the characteristics of these kinds of errors? 2. **RQ2**: What kinds of errors exist in SQL queries that aim to mislead automatic evaluation systems (such as OJSs), and what are the characteristics of these kinds of errors? ### Background and Motivation - **The Importance of SQL Skills**: With the wide application of database systems, SQL skills have become one of the core competencies of computer science students and are highly valued by the software industry. - **Challenges of Manual Evaluation**: Manually evaluating SQL queries is not only time - consuming but also difficult to provide timely feedback. Therefore, automatic evaluation tools (such as OJSs) have important application value in higher education. - **Limitations of OJSs**: Although OJSs are widely used in programming education, they mainly rely on the comparison of execution results and do not evaluate the correctness of query logic, which leads to the occurrence of false positives and deceptive errors. ### Research Methods 1. **Data Collection**: - Use a public data set containing more than 5.8 million SQL queries submitted by students and 22 programming problems. - Each programming problem in the data set provides requirements and the corresponding table structure. - Students can choose different DBMSs (such as MySQL, MS SQL Server, Oracle) and submit queries. The OJS will execute the queries and return the execution results, expected results, and compilation messages. 2. **Detecting Erroneous Queries**: - Use the Python Faker library to generate stronger test cases to detect false positives. - Verify the error - detecting ability of test cases through the mutation testing method. - Execute all the queries marked as correct and re - mark the queries with incorrect execution results as false positives. 3. **Data Analysis**: - Adopt directed content analysis and general content analysis methods to classify and analyze the errors in false positives. - Directed content analysis is used to verify or extend the existing SQL error classification framework. - General content analysis is used to name and classify newly discovered errors. ### Results - **False Positive Proportion**: Among approximately 5.8 million queries, about 1.94% (112,200 queries) are marked as false positives. - **Logical Errors**: The queries in false positives pass some test cases but fail to solve the problems correctly, so they belong to logical errors. - **Deceptive Errors**: Although some queries obtain correct execution results on specific test cases, their semantics clearly do not meet the requirements of the topic. These queries are defined as deceptive errors. ### Classification of Logical Errors - **Boundary Value Errors**: - Range - Specific Boundary Error - **Constraint Formulation Deviations**: - Constraint Omission Error - Redundant Constraint Error - Misplaced Condition Error - **Redundancy - Sort Discrepancies**: - Unintended Record Deduplication - Redundant Record Inclusion - Column Ordering Deviation