Alternating Differentiation for Optimization Layers

Haixiang Sun,Ye Shi,Jingya Wang,Hoang Duong Tuan,H. Vincent Poor,Dacheng Tao
DOI: https://doi.org/10.48550/arXiv.2210.01802
2023-04-24
Abstract:The idea of embedding optimization problems into deep neural networks as optimization layers to encode constraints and inductive priors has taken hold in recent years. Most existing methods focus on implicitly differentiating Karush-Kuhn-Tucker (KKT) conditions in a way that requires expensive computations on the Jacobian matrix, which can be slow and memory-intensive. In this paper, we developed a new framework, named Alternating Differentiation (Alt-Diff), that differentiates optimization problems (here, specifically in the form of convex optimization problems with polyhedral constraints) in a fast and recursive way. Alt-Diff decouples the differentiation procedure into a primal update and a dual update in an alternating way. Accordingly, Alt-Diff substantially decreases the dimensions of the Jacobian matrix especially for optimization with large-scale constraints and thus increases the computational speed of implicit differentiation. We show that the gradients obtained by Alt-Diff are consistent with those obtained by differentiating KKT conditions. In addition, we propose to truncate Alt-Diff to further accelerate the computational speed. Under some standard assumptions, we show that the truncation error of gradients is upper bounded by the same order of variables' estimation error. Therefore, Alt-Diff can be truncated to further increase computational speed without sacrificing much accuracy. A series of comprehensive experiments validate the superiority of Alt-Diff.
Machine Learning,Artificial Intelligence,Optimization and Control
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is how to improve the computational speed of convex optimization problems with large - scale constraints in deep neural networks. In particular, it aims to reduce the dimension of the Jacobian matrix during the implicit differentiation process to enhance computational efficiency. Specifically: 1. **Problems with Existing Methods**: - Existing methods usually calculate gradients by implicitly differentiating the Karush–Kuhn–Tucker (KKT) conditions, which requires expensive computations of the Jacobian matrix, resulting in slow computation speed and large memory consumption. - For large - scale optimization layers, the method of directly differentiating the KKT conditions is computationally difficult to scale. 2. **The New Method Proposed in the Paper (Alternating Differentiation, Alt - Diff)**: - The authors propose a new framework - Alternating Differentiation (Alt - Diff), which decomposes the optimization problem into multiple sub - problems and updates the primal and dual variables in an alternating manner. - Alt - Diff significantly reduces the dimension of the Jacobian matrix, thereby improving the computational speed of implicit differentiation, and is especially suitable for optimization problems with large - scale constraints. 3. **Main Contributions**: - **Improvement in Computational Efficiency**: Alt - Diff significantly improves the computational speed of implicit differentiation by reducing the dimension of the KKT matrix. - **Consistency and Truncation Ability**: It is proved that the gradients obtained by Alt - Diff are consistent with those obtained by differentiating the KKT conditions; and under certain assumptions, the truncated Alt - Diff can further accelerate the computation without sacrificing too much precision. - **Experimental Verification**: A series of experiments verify the superiority of Alt - Diff in large - scale optimization problems, especially in terms of computational speed. 4. **Application Scenarios**: - The paper demonstrates the application of Alt - Diff in multiple optimization layers (such as the sparsemax layer, the dense quadratic layer, and the Softmax layer with constraints), and verifies its effectiveness and efficiency in practical tasks (such as energy generation scheduling and image classification). In summary, this paper aims to develop a new method to significantly improve the implicit differentiation computational speed of convex optimization problems with large - scale constraints in deep neural networks, so that these optimization layers can be more efficiently applied to various practical scenarios.