Abstract:The idea of embedding optimization problems into deep neural networks as optimization layers to encode constraints and inductive priors has taken hold in recent years. Most existing methods focus on implicitly differentiating Karush-Kuhn-Tucker (KKT) conditions in a way that requires expensive computations on the Jacobian matrix, which can be slow and memory-intensive. In this paper, we developed a new framework, named Alternating Differentiation (Alt-Diff), that differentiates optimization problems (here, specifically in the form of convex optimization problems with polyhedral constraints) in a fast and recursive way. Alt-Diff decouples the differentiation procedure into a primal update and a dual update in an alternating way. Accordingly, Alt-Diff substantially decreases the dimensions of the Jacobian matrix especially for optimization with large-scale constraints and thus increases the computational speed of implicit differentiation. We show that the gradients obtained by Alt-Diff are consistent with those obtained by differentiating KKT conditions. In addition, we propose to truncate Alt-Diff to further accelerate the computational speed. Under some standard assumptions, we show that the truncation error of gradients is upper bounded by the same order of variables' estimation error. Therefore, Alt-Diff can be truncated to further increase computational speed without sacrificing much accuracy. A series of comprehensive experiments validate the superiority of Alt-Diff.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is how to improve the computational speed of convex optimization problems with large - scale constraints in deep neural networks. In particular, it aims to reduce the dimension of the Jacobian matrix during the implicit differentiation process to enhance computational efficiency. Specifically: 1. **Problems with Existing Methods**: - Existing methods usually calculate gradients by implicitly differentiating the Karush–Kuhn–Tucker (KKT) conditions, which requires expensive computations of the Jacobian matrix, resulting in slow computation speed and large memory consumption. - For large - scale optimization layers, the method of directly differentiating the KKT conditions is computationally difficult to scale. 2. **The New Method Proposed in the Paper (Alternating Differentiation, Alt - Diff)**: - The authors propose a new framework - Alternating Differentiation (Alt - Diff), which decomposes the optimization problem into multiple sub - problems and updates the primal and dual variables in an alternating manner. - Alt - Diff significantly reduces the dimension of the Jacobian matrix, thereby improving the computational speed of implicit differentiation, and is especially suitable for optimization problems with large - scale constraints. 3. **Main Contributions**: - **Improvement in Computational Efficiency**: Alt - Diff significantly improves the computational speed of implicit differentiation by reducing the dimension of the KKT matrix. - **Consistency and Truncation Ability**: It is proved that the gradients obtained by Alt - Diff are consistent with those obtained by differentiating the KKT conditions; and under certain assumptions, the truncated Alt - Diff can further accelerate the computation without sacrificing too much precision. - **Experimental Verification**: A series of experiments verify the superiority of Alt - Diff in large - scale optimization problems, especially in terms of computational speed. 4. **Application Scenarios**: - The paper demonstrates the application of Alt - Diff in multiple optimization layers (such as the sparsemax layer, the dense quadratic layer, and the Softmax layer with constraints), and verifies its effectiveness and efficiency in practical tasks (such as energy generation scheduling and image classification). In summary, this paper aims to develop a new method to significantly improve the implicit differentiation computational speed of convex optimization problems with large - scale constraints in deep neural networks, so that these optimization layers can be more efficiently applied to various practical scenarios.

Alternating Differentiation for Optimization Layers

Differentiable Convex Optimization Layers

Exploiting Problem Structure in Deep Declarative Networks: Two Case Studies

Efficient differentiable quadratic programming layers: an ADMM approach

Differentiation Through Black-Box Quadratic Programming Solvers

Fundamental Benefit of Alternating Updates in Minimax Optimization

A differentiable structural analysis framework for high-performance design optimization

Accelerated Gradient-free Neural Network Training by Multi-convex Alternating Optimization

Towards Differentiable Multilevel Optimization: A Gradient-Based Approach

On the Differentiability of the Primal-Dual Interior-Point Method

Efficient and Modular Implicit Differentiation

Decentralized Implicit Differentiation

Convergence Rates of Training Deep Neural Networks Via Alternating Minimization Methods.

BPQP: A Differentiable Convex Optimization Framework for Efficient End-to-End Learning

Universal Gradient Descent Ascent Method for Nonconvex-Nonconcave Minimax Optimization

Automatic Differentiation of Optimization Algorithms with Time-Varying Updates

Revisiting Implicit Differentiation for Learning Problems in Optimal Control

DFWLayer: Differentiable Frank-Wolfe Optimization Layer

Dual Descent Augmented Lagrangian Method and Alternating Direction Method of Multipliers

A Natural Primal-Dual Hybrid Gradient Method for Adversarial Neural Network Training on Solving Partial Differential Equations