Abstract:There hardly exists a general solver that is efficient for scheduling problems due to their diversity and complexity. In this study, we develop a two-stage framework, in which reinforcement learning (RL) and traditional operations research (OR) algorithms are combined together to efficiently deal with complex scheduling problems. The scheduling problem is solved in two stages, including a finite Markov decision process (MDP) and a mixed-integer programming process, respectively. This offers a novel and general paradigm that combines RL with OR approaches to solving scheduling problems, which leverages the respective strengths of RL and OR: The MDP narrows down the search space of the original problem through an RL method, while the mixed-integer programming process is settled by an OR algorithm. These two stages are performed iteratively and interactively until the termination criterion has been met. Under this idea, two implementation versions of the combination methods of RL and OR are put forward. The agile Earth observation satellite scheduling problem is selected as an example to demonstrate the effectiveness of the proposed scheduling framework and methods. The convergence and generalization capability of the methods are verified by the performance of training scenarios, while the efficiency and accuracy are tested in 50 untrained scenarios. The results show that the proposed algorithms could stably and efficiently obtain satisfactory scheduling schemes for agile Earth observation satellite scheduling problems. In addition, it can be found that RL-based optimization algorithms have stronger scalability than non-learning algorithms. This work reveals the advantage of combining reinforcement learning methods with heuristic methods or mathematical programming methods for solving complex combinatorial optimization problems.

Tuning of reinforcement learning parameters applied to SOP using the Scott–Knott method

Successive Convex Approximation Based Off-Policy Optimization for Constrained Reinforcement Learning

Random sampling-based automatic parameter tuning for nonlinear programming solvers

A Response Surface Model Approach to Parameter Estimation of Reinforcement Learning for the Travelling Salesman Problem

Reinforcement Learning Driven Heuristic Optimization

Learning to Relax: Setting Solver Parameters Across a Sequence of Linear System Instances

Beyond Training: Optimizing Reinforcement Learning Based Job Shop Scheduling Through Adaptive Action Sampling

A Two-stage Framework and Reinforcement Learning-based Optimization Algorithms for Complex Scheduling Problems

Solving Optimization Problems Using Reinforcement Learning, with Applications to Inverse Problems

A novel reinforcement learning-inspired tunicate swarm algorithm for solving global optimization and engineering design problems

Stochastic Constraint Programming as Reinforcement Learning

Optimizing parameters in swarm intelligence using reinforcement learning: An application of Proximal Policy Optimization to the iSOMA algorithm

Deep reinforcement learning and parameter transfer based approach for the multi-objective agile earth observation satellite scheduling problem

A multi-swarm optimizer with a reinforcement learning mechanism for large-scale optimization

Online meta-learning by parallel algorithm competition

Reinforcement Learning for Branch-and-Bound Optimisation using Retrospective Trajectories

A PSO-Assisted Reinforcement Learning Algorithm for Job Shop Scheduling

Principle and performance validation of search and rescue team algorithm

Strategically Conservative Q-Learning

Reinforcement learning for the traveling salesman problem with refueling

Reinforcement-learning-based parameter adaptation method for particle swarm optimization