Graph Reinforcement Learning for Network Control via Bi-Level Optimization

Daniele Gammelli,James Harrison,Kaidi Yang,Marco Pavone,Filipe Rodrigues,Francisco C. Pereira
2023-05-16
Abstract:Optimization problems over dynamic networks have been extensively studied and widely used in the past decades to formulate numerous real-world problems. However, (1) traditional optimization-based approaches do not scale to large networks, and (2) the design of good heuristics or approximation algorithms often requires significant manual trial-and-error. In this work, we argue that data-driven strategies can automate this process and learn efficient algorithms without compromising optimality. To do so, we present network control problems through the lens of reinforcement learning and propose a graph network-based framework to handle a broad class of problems. Instead of naively computing actions over high-dimensional graph elements, e.g., edges, we propose a bi-level formulation where we (1) specify a desired next state via RL, and (2) solve a convex program to best achieve it, leading to drastically improved scalability and performance. We further highlight a collection of desirable features to system designers, investigate design decisions, and present experiments on real-world control problems showing the utility, scalability, and flexibility of our framework.
Machine Learning,Systems and Control,Optimization and Control
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are several key challenges in dynamic network control, in particular: 1. **Optimization problems of large - scale networks**: Traditional optimization - based methods are difficult to scale when dealing with large - scale networks, resulting in inefficiency. 2. **Design of efficient algorithms**: Designing effective heuristic or approximate algorithms usually requires a large number of manual trial - and - error processes. 3. **Non - linearity, randomness and multi - stage decision - making**: Traditional methods have limitations when dealing with non - linear dynamics, stochastic systems and the curse of dimensionality in time - extended networks. To address these challenges, the author proposes a two - level optimization framework based on graph networks and reinforcement learning. Specifically, this framework solves the above problems in the following ways: - **Two - level optimization**: The framework adopts a two - level optimization method, where the first level specifies the next desired state through reinforcement learning, and the second level achieves this state by solving a convex optimization problem. This method not only improves scalability but also maintains optimization performance. - **Data - driven strategy**: Use a data - driven method to automate the algorithm design process, thereby learning efficient algorithms without sacrificing optimality. - **Flexibility and adaptability**: The framework can handle various types of dynamic network control problems, including practical applications such as supply chain management and vehicle dynamic route planning. ### Paper contributions The main contributions of the paper include: 1. **Proposing a two - level reinforcement learning method based on graph networks**, which combines the advantages of direct optimization and reinforcement learning. 2. **Exploring the architectural components and design decisions within the framework**, such as the choice of graph aggregation functions, action parameterization, exploration strategies and their impact on system performance. 3. **Demonstrating the advantages of this method in terms of performance, scalability and robustness**, especially in artificial test problems and practical problems (such as supply chain inventory management and dynamic vehicle route planning), this method performs better than classical optimization methods, domain - specific heuristic algorithms and pure end - to - end reinforcement learning methods. ### Specific problem solving - **Large - scale network optimization**: Through the two - level optimization framework, this method can operate efficiently on large - scale networks, avoiding the computational bottlenecks of traditional methods. - **Efficient algorithm design**: The data - driven strategy reduces the need for manual trial - and - error and automatically generates efficient control algorithms. - **Non - linearity, randomness and multi - stage decision - making**: The framework effectively deals with non - linear dynamics, randomness and multi - stage decision - making problems by combining reinforcement learning and optimization methods. In summary, this paper solves several key challenges in dynamic network control by proposing an innovative two - level optimization framework and demonstrates its superior performance and wide applicability in practical applications.