Evaluating Automatic Difficulty Estimation of Logic Formalization Exercises

Alexandra Mayn,Kees van Deemter
DOI: https://doi.org/10.48550/arXiv.2204.12197
2022-04-26
Abstract:Teaching logic effectively requires an understanding of the factors which cause logic students to struggle. Formalization exercises, which require the student to produce a formula corresponding to the natural language sentence, are a good candidate for scrutiny since they tap into the students' understanding of various aspects of logic. We correlate the difficulty of formalization exercises predicted by a previously proposed difficulty estimation algorithm with two empirical difficulty measures on the Grade Grinder corpus, which contains student solutions to FOL exercises. We obtain a moderate correlation with both measures, suggesting that the said algorithm indeed taps into important sources of difficulty but leaves a fair amount of variance uncaptured. We conduct an error analysis, closely examining exercises which were misclassified, with the aim of identifying additional sources of difficulty. We identify three additional factors which emerge from the difficulty analysis, namely predicate complexity, pragmatic factors and typicality of the exercises, and discuss the implications of automated difficulty estimation for logic teaching and explainable AI.
Logic in Computer Science
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? The main purpose of this paper is to evaluate and improve the automatic difficulty estimation method for logical formalization exercises. Specifically, the author attempts to answer the following key questions: 1. **Effectiveness of existing algorithms**: - The paper first evaluates the effectiveness of the automatic difficulty estimation algorithm proposed by Perikos et al. (2016) in actual student performance. This algorithm predicts the difficulty of logical formalization exercises based on logical formulas and natural - language features. - The author uses the data in the Grade Grinder corpus to compare the difficulty predicted by the algorithm with two empirical difficulty indicators: First Attempt Correct (FAC) and Average Attempts (AA). The results show that although the algorithm does capture some important sources of difficulty, a considerable part of the variance remains unexplained. 2. **Misclassification analysis**: - To further improve the difficulty estimation, the author conducts an error analysis and carefully examines the exercises that are misclassified by the algorithm. In this way, they attempt to identify additional difficulty factors that may not be considered by the existing algorithm. 3. **New difficulty factors**: - Based on the error analysis, the author discovers three additional difficulty factors: - **Predicate Complexity**: Sentences involving multiple predicates or complex predicate structures may increase the difficulty. - **Pragmatic Factors**: The context and intention of a sentence may affect students' understanding. - **Typicality of the Exercises**: Some exercises may be more difficult or easier because of their particularity. 4. **Implications for teaching and explainable AI**: - Finally, the author discusses the application of automatic difficulty estimation in logic teaching and explainable AI. By more accurately assessing the difficulty of logical formulas, the system can better assist teachers in designing courses and provide more effective explanations and support for users. ### Summary This paper aims to improve the difficulty prediction of logical formalization exercises by evaluating the effectiveness of existing automatic difficulty estimation algorithms and identifying new difficulty factors through misclassification analysis. This not only helps to improve the effectiveness of logic teaching but also has important significance for the development of explainable AI.