A Maintenance Planning Framework using Online and Offline Deep Reinforcement Learning

Zaharah A. Bukhsh,Nils Jansen,Hajo Molegraaf
DOI: https://doi.org/10.1007/s00521-023-08560-7
2023-04-18
Abstract:Cost-effective asset management is an area of interest across several industries. Specifically, this paper develops a deep reinforcement learning (DRL) solution to automatically determine an optimal rehabilitation policy for continuously deteriorating water pipes. We approach the problem of rehabilitation planning in an online and offline DRL setting. In online DRL, the agent interacts with a simulated environment of multiple pipes with distinct lengths, materials, and failure rate characteristics. We train the agent using deep Q-learning (DQN) to learn an optimal policy with minimal average costs and reduced failure probability. In offline learning, the agent uses static data, e.g., DQN replay data, to learn an optimal policy via a conservative Q-learning algorithm without further interactions with the environment. We demonstrate that DRL-based policies improve over standard preventive, corrective, and greedy planning alternatives. Additionally, learning from the fixed DQN replay dataset in an offline setting further improves the performance. The results warrant that the existing deterioration profiles of water pipes consisting of large and diverse states and action trajectories provide a valuable avenue to learn rehabilitation policies in the offline setting, which can be further fine-tuned using the simulator.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
What problems does this paper attempt to solve? This paper aims to solve how to use deep reinforcement learning (DRL) to automatically develop optimal repair strategies for continuously deteriorating water supply pipeline systems to meet economic and performance requirements. Specifically, the author has developed an online - and offline - combined deep reinforcement learning framework to optimize repair plans, achieving the following goals: 1. **Minimize the average cost**: By learning the optimal intervention strategy, reduce the cost of maintaining and replacing pipelines. 2. **Reduce the probability of failure**: Ensure the reliability of the pipeline system and reduce the possibility of sudden failures. 3. **Improve decision - making efficiency**: Compared with traditional preventive, corrective, and greedy scheduling methods, DRL can more effectively find the optimal maintenance strategy. ### Method overview - **Online deep reinforcement learning (Online DRL)**: - Use the deep Q - network (DQN) to interact with the simulation environment and learn a strategy that can make optimal decisions under different conditions (such as pipeline length, material, failure rate, etc.). - The goal is to maximize the cumulative reward, that is, to minimize the total maintenance cost and the probability of failure within a given time range. - **Offline deep reinforcement learning (Offline DRL)**: - Use static data sets (for example, data in the DQN replay buffer) for learning, avoiding the need for further interaction with the environment. - Adopt the Conservative Q - Learning (CQL) algorithm to learn the optimal strategy from a fixed data set while preventing over - estimation of unseen actions. ### Key contributions 1. **Innovative solution**: For the first time, apply offline reinforcement learning to a practical problem - repair planning of water supply pipeline systems. 2. **Improve existing methods**: Further optimize the strategy by re - using the data accumulated during the online learning process. 3. **Verify effectiveness**: Experimental results show that the DRL method is superior to traditional preventive, corrective, and greedy scheduling methods. ### Summary This research not only shows the application potential of deep reinforcement learning in complex real - world problems but also provides a valuable reference for other fields (such as asset management, health, manufacturing, and transportation).