Abstract:We present a general framework for applying learning algorithms and heuristical guidance to the verification of Markov decision processes (MDPs). The primary goal of our techniques is to improve performance by avoiding an exhaustive exploration of the state space, instead focussing on particularly relevant areas of the system, guided by heuristics. Our work builds on the previous results of Br{á}zdil et al., significantly extending it as well as refining several details and fixing errors.

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper aims to address the verification problem of Markov Decision Processes (MDP), specifically how to improve performance by avoiding exhaustive exploration of the state space. Specifically, the paper proposes a general framework that leverages learning algorithms and heuristic guidance to achieve this goal. **Main Contributions Include:** 1. **Probabilistic Reachability Problem**: - The framework focuses on the probabilistic reachability problem, which is a core issue in verification. The framework is implemented in two different scenarios: - The first scenario assumes complete knowledge of the MDP, including exact transition probabilities. This method performs heuristic-driven partial exploration to obtain precise upper and lower bounds of the required probability. - The second scenario can only sample the MDP without knowing the exact transition dynamics. In this case, the method provides probabilistic guarantees, i.e., estimates of the upper and lower bounds, thus offering an effective stopping criterion for approximation. 2. **Algorithm Framework**: - A scalable framework is proposed to efficiently solve the reachability problem on "full-information" MDPs and extend it to arbitrary MDPs. - A model-free PAC learning algorithm suitable for "limited-information" MDPs is introduced and extended to arbitrary MDPs. 3. **Statistical Model Checking**: - In the limited information setting, a PAC model-free algorithm based on Delayed Q-Learning is proposed, which can provide statistical upper and lower bounds on the maximum reachability. 4. **Impact on Related Work**: - The work of this paper directly influences many subsequent studies, particularly in the application of BRTDP methods and their variants, which have been extended to areas such as long-term average rewards, continuous-time Markov chains, continuous-space MDPs, and stochastic games. In summary, the main goal of this paper is to improve the efficiency of MDP verification through heuristic methods and learning algorithms, especially when dealing with large-scale systems, thus avoiding traditional exhaustive exploration methods.

Learning Algorithms for Verification of Markov Decision Processes

Learning Probabilistic Models for Model Checking: an Evolutionary Approach and an Empirical Study

A Lazy Abstraction Algorithm for Markov Decision Processes: Theory and Initial Evaluation

Learning Markov Decision Processes for Model Checking

Verification of deep probabilistic models

Robust Anytime Learning of Markov Decision Processes

Learning-Based Verification of Stochastic Dynamical Systems with Neural Network Policies

Certified Policy Verification and Synthesis for MDPs under Distributional Reach-avoidance Properties

Active model learning of stochastic reactive systems (extended version)

What Are the Odds? Improving the foundations of Statistical Model Checking

Formal Verification of Unknown Dynamical Systems via Gaussian Process Regression

Abstraction-Refinement for Hierarchical Probabilistic Models

Learning Markov State Abstractions for Deep Reinforcement Learning

Verification and Control of Turn-Based Probabilistic Real-Time Games

Learning Weighted Assumptions for Compositional Verification of Markov Decision Processes

Linear-Time Verification of Data-Aware Processes Modulo Theories via Covers and Automata (Extended Version)

Markov Abstractions for PAC Reinforcement Learning in Non-Markov Decision Processes

Online Model-free Safety Verification for Markov Decision Processes Without Safety Violation

Automatic Verification of Competitive Stochastic Systems

Multimodal Pretrained Models for Verifiable Sequential Decision-Making: Planning, Grounding, and Perception

Runtime Verification of Learning Properties for Reinforcement Learning Algorithms