Ali Jadbabaie,Devavrat Shah,Sean R. Sinclair
Abstract:The framework of decision-making, modeled as a Markov Decision Process (MDP), typically assumes a single objective. However, most practical scenarios involve considering tradeoffs between multiple objectives. With that as the motivation, we consider the task of finding the Pareto front of achievable tradeoffs in the context of Linear Quadratic Regulator (LQR), a canonical example of a continuous, infinite horizon MDP. As our first contribution, we establish that the Pareto front for LQR is characterized by linear scalarization, wherein a linear combination of the objectives creates a single objective, and by varying the weight of the linear combination one achieves different possible tradeoffs. That is, each tradeoff point on the Pareto front of multi-objective LQR turns out to be a single objective LQR where the objective is a convex combination of the multiple objectives. Intellectually, our work provides an important example of linear scalarization being sufficient for a non-convex multi-objective problem. As our second contribution, we establish the smoothness of the Pareto front, showing that the optimal control to an $\epsilon$-perturbation to a scalarization parameter yields an $O(\epsilon)$-approximation to its objective performance. Together these results highlight a simple algorithm to approximate the continuous Pareto front by optimizing over a grid of scalarization parameters. Unlike other scalarization methods, each individual optimization problem retains the structure of a single objective LQR problem, making them computationally feasible. Lastly, we extend the results to consider certainty equivalence, where the unknown dynamics are replaced with estimates.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to characterize and approximate the Pareto front in the Multi - Objective Linear Quadratic Regulator (MObjLQR). Specifically, the paper focuses on how to handle the trade - offs between multiple objectives in the Markov Decision Process (MDP) in a continuous, infinite - time horizon. Traditional methods usually assume only one objective, but in practical scenarios, it is often necessary to consider the balance between multiple objectives. Therefore, the main contributions of the paper are as follows:
1. **Characterizing the Pareto front of MObjLQR**:
- The paper proves that the Pareto front of MObjLQR can be characterized by linear scalarization. Specifically, each Pareto - optimal control can be regarded as the solution of a single - objective LQR problem, where the cost matrix is a weighted combination of the original cost matrices. This result shows that even in non - convex multi - objective problems, linear scalarization is sufficient.
- The formula is expressed as follows:
\[
L_w(K)=\sum_{i} w_i L_i(K)=L(K, \sum_{i} w_i Q_i, \sum_{i} w_i R_i)=L(K, Q_w, R_w),
\]
where \(Q_w = \sum_{i} w_i Q_i\) and \(R_w=\sum_{i} w_i R_i\).
2. **An algorithm for approximating the Pareto front**:
- The paper proposes an algorithm to approximate the Pareto front through discretization techniques. Given any desired precision \(\epsilon\), an \(\epsilon\)-net set of scalarization parameters can be created, and the optimal control can be solved at each discretized scalarization point. Through this method, the accuracy of the approximate Pareto front can be guaranteed.
- The specific steps of the algorithm are as follows:
1. Create an \(\epsilon\)-net set of scalarization parameters \(\{w_1, w_2,\ldots, w_k\}\).
2. For each \(w_i\), solve the single - objective LQR problem:
\[
K_{w_i}=\arg \min_{K \in S} L(K, Q_{w_i}, R_{w_i}),
\]
where \(Q_{w_i}=\sum_{i} w_i Q_i\) and \(R_{w_i}=\sum_{i} w_i R_i\).
3. Collect all \(K_{w_i}\) as the solutions of the approximate Pareto front.
3. **Extension to the case of unknown system dynamics**:
- The paper also discusses the case of using the certainty equivalence method when the system dynamics are unknown. In this case, the paper proves that as long as the estimation error of the system dynamics is on the same order of magnitude as the required approximation accuracy, the above algorithm is still valid.
Through these contributions, the paper provides a theoretical basis, proves the effectiveness of linear scalarization in multi - objective LQR problems, and proposes a practical algorithm to approximate the Pareto front. This not only provides new solutions for multi - objective control problems but also provides theoretical support for multi - objective optimization problems in practical applications.