Abstract:The quick evolution and widespread applicability of machine learning and artificial intelligence have fundamentally shaped and transcended modern life. Three key players stand behind such a ubiquitous emergence: big data, growing computing power, and improved algorithms. The need for distributed storage and processing arises from this ``data deluge'' that can flood any powerful machine, from the dispersively available data that are prohibitively costly to transfer to a central unit for further processing, and from the prevalent Internet-of-Things (IoT) devices that require real-time response as well as respect of privacy. Modern machine learning algorithms built to exploit such huge amounts of data are often computationally ``hungry'' and their ``appetite'' for computing power increases rapidly at a pace unmatched by the development of computing hardware. All these considerations justify the pressing need for distributed optimization algorithms that are scalable yet flexible to adapt to various configurations of networked computing nodes. To cope with these challenges, the present thesis first introduces a novel ADMM based approach (termed hybrid ADMM) for efficient decentralized optimization. By modeling the underlying communication patterns as hypergraphs, it provides a unifying framework that subsumes both centralized and fully decentralized counterparts as special cases, and allows nodes to communicate in centralized and decentralized ways at the same time. Leveraging the expressiveness of hypergraph models, a technique termed ``in-network acceleration'' is introduced enabling ``almost free'' performance gain by exploiting local graph topology. To account for heterogeneity of nodes and edges, a diagonal scaling based approach is proposed to tackle weighted updates, where proper edge weights are identified through solving a preconditioning problem. By assigning larger weights to critical edges, the proposed algorithm achieves higher efficiency and becomes more robust to perturbations. Finally, to boost the efficiency of the whole system to its full potential, an asynchronous method is introduced to mitigate the straggler problem so that nodes with different processing power can run at full speed. Convergence analysis for proposed algorithms is provided which reveals the connection between convergence rate and spectral properties of the communication graph. Numerical tests of several common tasks on different graphs are carried out to demonstrate the effectiveness of proposed algorithms.

Communication Efficient Parallel Algorithms for Optimization on Manifolds

Parallel Stochastic Optimization Framework for Large-Scale Non-Convex Stochastic Problems

Distributed Algorithms for Composite Optimization: Unified Framework and Convergence Analysis

A Unified Algorithmic Framework for Distributed Composite Optimization.

A Unified Contraction Analysis of a Class of Distributed Algorithms for Composite Optimization

Accelerated Primal-Dual Algorithms for Distributed Smooth Convex Optimization over Networks

A General Framework for Distributed Partitioned Optimization

Improving the communication in decentralized manifold optimization through single-step consensus and compression

Communication-efficient Distributed Newton-like Optimization with Gradients and M-estimators

Global Optimization with Orthogonality Constraints Via Stochastic Diffusion on Manifold

Accelerating Distributed Optimization via Fixed-time Convergent Flows: Extensions to Non-convex Functions and Consistent Discretization

L1-Regularized Distributed Optimization: A Communication-Efficient Primal-Dual Framework

Optimization on manifolds: A symplectic approach

Collective Communication Optimization for Solving Linear Algebraic Equations

Efficient Methods for Decentralized Optimization over Graphs

Gradient Sparsification for Communication-Efficient Distributed Optimization

Decentralized projected Riemannian stochastic recursive momentum method for smooth optimization on compact submanifolds

On Optimizing the Communication of Model Parallelism

Communication-efficient distributed optimization with adaptability to system heterogeneity

Finite Projective Geometry based Fast, Conflict-free Parallel Matrix Computations

Convergence in High Probability of Distributed Stochastic Gradient Descent Algorithms