Abstract:Teaching logic effectively requires an understanding of the factors which cause logic students to struggle. Formalization exercises, which require the student to produce a formula corresponding to the natural language sentence, are a good candidate for scrutiny since they tap into the students' understanding of various aspects of logic. We correlate the difficulty of formalization exercises predicted by a previously proposed difficulty estimation algorithm with two empirical difficulty measures on the Grade Grinder corpus, which contains student solutions to FOL exercises. We obtain a moderate correlation with both measures, suggesting that the said algorithm indeed taps into important sources of difficulty but leaves a fair amount of variance uncaptured. We conduct an error analysis, closely examining exercises which were misclassified, with the aim of identifying additional sources of difficulty. We identify three additional factors which emerge from the difficulty analysis, namely predicate complexity, pragmatic factors and typicality of the exercises, and discuss the implications of automated difficulty estimation for logic teaching and explainable AI.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? The main purpose of this paper is to evaluate and improve the automatic difficulty estimation method for logical formalization exercises. Specifically, the author attempts to answer the following key questions: 1. **Effectiveness of existing algorithms**: - The paper first evaluates the effectiveness of the automatic difficulty estimation algorithm proposed by Perikos et al. (2016) in actual student performance. This algorithm predicts the difficulty of logical formalization exercises based on logical formulas and natural - language features. - The author uses the data in the Grade Grinder corpus to compare the difficulty predicted by the algorithm with two empirical difficulty indicators: First Attempt Correct (FAC) and Average Attempts (AA). The results show that although the algorithm does capture some important sources of difficulty, a considerable part of the variance remains unexplained. 2. **Misclassification analysis**: - To further improve the difficulty estimation, the author conducts an error analysis and carefully examines the exercises that are misclassified by the algorithm. In this way, they attempt to identify additional difficulty factors that may not be considered by the existing algorithm. 3. **New difficulty factors**: - Based on the error analysis, the author discovers three additional difficulty factors: - **Predicate Complexity**: Sentences involving multiple predicates or complex predicate structures may increase the difficulty. - **Pragmatic Factors**: The context and intention of a sentence may affect students' understanding. - **Typicality of the Exercises**: Some exercises may be more difficult or easier because of their particularity. 4. **Implications for teaching and explainable AI**: - Finally, the author discusses the application of automatic difficulty estimation in logic teaching and explainable AI. By more accurately assessing the difficulty of logical formulas, the system can better assist teachers in designing courses and provide more effective explanations and support for users. ### Summary This paper aims to improve the difficulty prediction of logical formalization exercises by evaluating the effectiveness of existing automatic difficulty estimation algorithms and identifying new difficulty factors through misclassification analysis. This not only helps to improve the effectiveness of logic teaching but also has important significance for the development of explainable AI.

Evaluating Automatic Difficulty Estimation of Logic Formalization Exercises

Using Automated Theorem Provers to Teach Knowledge Representation in First-Order Logic

When Do Program-of-Thought Works for Reasoning?

Don't Trust: Verify -- Grounding LLM Quantitative Reasoning with Autoformalization

Automatic Curriculum Expert Iteration for Reliable LLM Reasoning

LogicPro: Improving Complex Logical Reasoning via Program-Guided Learning

When Do Program-of-Thoughts Work for Reasoning?

"Boring formal methods" or "Sherlock Holmes deduction methods"?

Teaching Functional Programmers Logic and Metatheory

Automatic question generation for propositional logical equivalences

LogicAsker: Evaluating and Improving the Logical Reasoning Ability of Large Language Models

Fail better: What formalized math can teach us about learning

Automatic extraction of structured information from elementary level geometry questions into logic forms

Logic considered fun

LOGIC-LM++: Multi-Step Refinement for Symbolic Formulations

Automated Theorem Provers Help Improve Large Language Model Reasoning

Combining Logic with Large Language Models for Automatic Debugging and Repair of ASP Programs

Automating the Generation of High School Geometry Proofs using Prolog in an Educational Context

Learn from Failure: Fine-Tuning LLMs with Trial-and-Error Data for Intuitionistic Propositional Logic Proving

Comparing Differentiable Logics for Learning with Logical Constraints

How Reliable Are Automatic Evaluation Methods for Instruction-Tuned LLMs?