Efficiently Learning Probabilistic Logical Models by Cheaply Ranking Mined Rules

Jonathan Feldstein,Dominic Phillips,Efthymia Tsamoura
2024-09-25
Abstract:Probabilistic logical models are a core component of neurosymbolic AI and are important models in their own right for tasks that require high explainability. Unlike neural networks, logical models are often handcrafted using domain expertise, making their development costly and prone to errors. While there are algorithms that learn logical models from data, they are generally prohibitively expensive, limiting their applicability in real-world settings. In this work, we introduce precision and recall for logical rules and define their composition as rule utility -- a cost-effective measure to evaluate the predictive power of logical models. Further, we introduce SPECTRUM, a scalable framework for learning logical models from relational data. Its scalability derives from a linear-time algorithm that mines recurrent structures in the data along with a second algorithm that, using the cheap utility measure, efficiently ranks rules built from these structures. Moreover, we derive theoretical guarantees on the utility of the learnt logical model. As a result, SPECTRUM learns more accurate logical models orders of magnitude faster than previous methods on real-world datasets.
Artificial Intelligence
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problems of learning efficiency and scalability in **Probabilistic Logical Models**. Specifically, the author points out the following: 1. **The need for high interpretability**: Probabilistic logical models are a core component of Neurosymbolic AI and are very important in tasks that require high interpretability. Unlike neural networks, logical models usually need to be hand - written by domain experts, which makes the development process both expensive and error - prone. 2. **Limitations of existing learning algorithms**: Although there are algorithms that can learn logical models from data, these algorithms are usually computationally expensive, which limits their wide use in practical applications. Therefore, the existing structure - learning methods are difficult to handle large - scale data sets. 3. **Introduction of new evaluation criteria**: To solve the above problems, the author introduces Precision, Recall and their combination - Rule Utility, which is a cost - effective evaluation metric for measuring the predictive ability of logical models. 4. **Proposal of the SPECTRUM framework**: The author proposes an extensible framework named SPECTRUM for learning logical models from relational data. The framework mines the repetitive structures in the data through a linear - time algorithm and efficiently ranks the rules constructed based on these structures using a cheap utility measure. In addition, the author also provides theoretical guarantees on the utility of the learned logical models. 5. **Significantly improved efficiency**: The experimental results show that SPECTRUM is several orders of magnitude faster than existing methods on real - world data sets and can improve accuracy while reducing running time. ### Formula summary - **Utility of a rule**: \[ U(\rho):=\frac{P(\rho) S(\rho)}{B(\rho)}\cdot R(\rho) C(\rho) \] where: - \(P(\rho)\) is the Precision of the rule, defined as: \[ P(\rho):=\frac{\vert G_{\text{body}(\rho)\wedge\text{head}(\rho)}\vert}{\vert G_{\text{body}(\rho)}\vert} \] - \(S(\rho)\) is the Symmetry factor of the rule, defined as: \[ S(\rho):=\text{Number of sub - graphs} \] - \(B(\rho)\) is the Bayesian Prior of the rule, defined as: \[ B(\rho):=\frac{\vert G_{\text{head}(\rho)}\vert}{\sum_{\alpha\in A}\vert G_{\alpha}\vert} \] - \(R(\rho)\) is the Recall of the rule, defined as: \[ R(\rho):=\sum_{\alpha\in\alpha}\ln(1 + \vert G_{\text{head}(\rho)=\alpha}^{\text{body}(\rho)\wedge\text{head}(\rho)}\vert) \] - \(C(\rho)\) is the Complexity factor of the rule, defined as: \[ C(\rho):=e^{-L(\rho)} \] Through these improvements, the SPECTRUM framework can learn logical models more efficiently, thus having higher practical value in practical applications.