NeuroBack: Improving CDCL SAT Solving using Graph Neural Networks

Wenxi Wang,Yang Hu,Mohit Tiwari,Sarfraz Khurshid,Kenneth McMillan,Risto Miikkulainen
2024-05-09
Abstract:Propositional satisfiability (SAT) is an NP-complete problem that impacts many research fields, such as planning, verification, and security. Mainstream modern SAT solvers are based on the Conflict-Driven Clause Learning (CDCL) algorithm. Recent work aimed to enhance CDCL SAT solvers using Graph Neural Networks (GNNs). However, so far this approach either has not made solving more effective, or required substantial GPU resources for frequent online model inferences. Aiming to make GNN improvements practical, this paper proposes an approach called NeuroBack, which builds on two insights: (1) predicting phases (i.e., values) of variables appearing in the majority (or even all) of the satisfying assignments are essential for CDCL SAT solving, and (2) it is sufficient to query the neural model only once for the predictions before the SAT solving starts. Once trained, the offline model inference allows NeuroBack to execute exclusively on the CPU, removing its reliance on GPU resources. To train NeuroBack, a new dataset called DataBack containing 120,286 data samples is created. NeuroBack is implemented as an enhancement to a state-of-the-art SAT solver called Kissat. As a result, it allowed Kissat to solve up to 5.2% and 7.4% more problems on two recent SAT competition problem sets, SATCOMP-2022 and SATCOMP-2023, respectively. NeuroBack therefore shows how machine learning can be harnessed to improve SAT solving in an effective and practical manner.
Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address two main challenges in the Boolean Satisfiability (SAT) problem: 1. **Improving the effectiveness of SAT solvers**: Modern mainstream SAT solvers are based on the Conflict-Driven Clause Learning (CDCL) algorithm. Despite significant progress, there are still efficiency bottlenecks when dealing with large-scale and complex problems. Existing attempts to enhance CDCL SAT solvers with Graph Neural Networks (GNN) either do not significantly improve solving performance or require substantial GPU resources for frequent online model inference. 2. **Reducing dependence on GPU resources**: Current GNN methods require frequent online model inference during application, leading to high computational resource demands, especially in parallel deployments. The lack of sufficient GPU resources can become a major performance bottleneck for these methods. To address these issues, the paper proposes a new method called NeuroBack. The core idea of NeuroBack is to perform offline model inference once before the solving process begins to obtain static information that guides the CDCL SAT solving. Specifically, NeuroBack achieves this goal through the following means: - **Predicting the phase of variables**: NeuroBack uses a trained GNN model to predict the phase in which variables appear in most or all satisfying assignments, thereby improving the phase selection heuristic of the CDCL solver. - **Independence from GPU resources**: Offline model inference allows NeuroBack to be executed on a CPU, completely eliminating the need for GPU resources, thus enhancing the method's practicality and scalability. Through these innovations, NeuroBack not only improves the solving efficiency of SAT solvers but also reduces dependence on expensive computational resources, making it more feasible for practical applications. Experimental results show that NeuroBack can significantly enhance the performance of advanced SAT solvers like Kissat, solving 5.2% and 7.4% more problems on the SATCOMP-2022 and SATCOMP-2023 benchmarks, respectively.