An Automated SQL Query Grading System Using An Attention-Based Convolutional Neural Network

Donald R. Schwartz,Pablo Rivas
2024-06-23
Abstract:Grading SQL queries can be a time-consuming, tedious and challenging task, especially as the number of student submissions increases. Several systems have been introduced in an attempt to mitigate these challenges, but those systems have their own limitations. This paper describes our novel approach to automating the process of grading SQL queries. Unlike previous approaches, we employ a unique convolutional neural network architecture that employs a parameter-sharing approach for different machine learning tasks that enables the architecture to induce different knowledge representations of the data to increase its potential for understanding SQL statements.
Computers and Society,Artificial Intelligence,Databases,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenge of automated SQL query scoring. Specifically, the authors focus on the following key issues: 1. **Possibility of multiple correct answers**: SQL queries can be written in multiple different ways to achieve the same functionality. For example, the simple query "List the names of professors who teach the 'Introduction to Programming' course" can be written in multiple ways, including non - nested queries, nested EXISTS queries, nested IN queries, etc. As the complexity of the query increases, the number of possible correct answers will also increase significantly. 2. **Consistency of partial scoring**: For incompletely correct queries, how to consistently assign partial scores is a challenge. Due to the diversity of SQL queries, it becomes very difficult to ensure the consistency of the scoring criteria. 3. **Tediousness of scoring**: As the number of SQL queries submitted by students increases, manual scoring becomes very time - consuming and error - prone. Teachers need to spend a lot of time checking and scoring students' SQL queries one by one. To solve these problems, the authors propose an automated SQL query scoring system based on the self - attention mechanism and convolutional neural network (CNN). The uniqueness of this system lies in its use of the parameter - sharing method, which enables the model to learn from different tasks and induce different data representations, thereby better understanding SQL statements. ### Formula Representation When describing the model architecture, some formulas and mathematical expressions are involved. Here are the formula representations of several key parts: 1. **Convolutional self - attention layer**: - The convolutional encoding layer is used to model the query \( Q \) and the value \( V \): \[ Q = W_Q \cdot E, \quad V = W_V \cdot E \] where \( E \) is the input embedding vector, and \( W_Q \) and \( W_V \) are weight matrices. - The self - attention mechanism is calculated by dot - product similarity: \[ \text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V \] where \( d_k \) is the dimension of the key vector. 2. **Pooling strategy**: - Use global average pooling to reduce the dimension of the problem: \[ \text{GlobalAvgPool}(X) = \frac{1}{n} \sum_{i = 1}^{n} X_i \] Calculate the average value of each channel of the feature map \( X \). 3. **Loss function**: - Binary cross - entropy loss is used to train the model: \[ L = -\frac{1}{N} \sum_{i = 1}^{N} \left[ y_i \log(p_i) + (1 - y_i) \log(1 - p_i) \right] \] where \( y_i \) is the true label, \( p_i \) is the predicted probability, and \( N \) is the number of samples. Through these methods, this system aims to improve the automation degree of SQL query scoring, reduce the time and error rate of manual scoring, and ensure the consistency and accuracy of scoring.