Federated Zeroth-Order Optimization using Trajectory-Informed Surrogate Gradients

Yao Shu,Xiaoqiang Lin,Zhongxiang Dai,Bryan Kian Hsiang Low
2023-08-08
Abstract:Federated optimization, an emerging paradigm which finds wide real-world applications such as federated learning, enables multiple clients (e.g., edge devices) to collaboratively optimize a global function. The clients do not share their local datasets and typically only share their local gradients. However, the gradient information is not available in many applications of federated optimization, which hence gives rise to the paradigm of federated zeroth-order optimization (ZOO). Existing federated ZOO algorithms suffer from the limitations of query and communication inefficiency, which can be attributed to (a) their reliance on a substantial number of function queries for gradient estimation and (b) the significant disparity between their realized local updates and the intended global updates. To this end, we (a) introduce trajectory-informed gradient surrogates which is able to use the history of function queries during optimization for accurate and query-efficient gradient estimation, and (b) develop the technique of adaptive gradient correction using these gradient surrogates to mitigate the aforementioned disparity. Based on these, we propose the federated zeroth-order optimization using trajectory-informed surrogate gradients (FZooS) algorithm for query- and communication-efficient federated ZOO. Our FZooS achieves theoretical improvements over the existing approaches, which is supported by our real-world experiments such as federated black-box adversarial attack and federated non-differentiable metric optimization.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve query efficiency and communication efficiency in Federated Zeroth - Order Optimization (ZOO). Specifically: 1. **Query efficiency**: In many application scenarios of federated optimization, gradient information is unavailable or difficult to obtain. Therefore, existing federated zeroth - order optimization algorithms rely on the Finite Difference (FD) method to estimate gradients. This method usually requires a large number of function queries, resulting in low query efficiency. The paper proposes a gradient proxy method based on trajectory information, which uses historical function query data to efficiently estimate gradients, thereby reducing the need for additional function queries. 2. **Communication efficiency**: The difference between the local update at the client side and the global objective in existing federated zeroth - order optimization algorithms is large, resulting in low communication efficiency. The paper improves communication efficiency by introducing an adaptive gradient correction technique and using high - quality local and global gradient proxies to reduce this difference. ### Main contributions of the paper 1. **Gradient proxy of trajectory information**: - Introduces the Derived Gaussian Process based on trajectory information as a local gradient proxy for the client. - This method only requires historical function query data and no additional function queries, thus improving query efficiency. 2. **High - quality gradient correction**: - Uses the Random Fourier Features (RFF) approximation method to construct a transferable global gradient proxy. - Develops an adaptive gradient correction technique. By adjusting the gradient correction vector and length, it further reduces the difference between local updates and the global objective and improves communication efficiency. ### Theoretical analysis 1. **Gradient difference analysis**: - Through theoretical analysis, it is proved that the proposed local gradient update method has significant advantages in terms of gradient difference. - Specifically, compared with existing methods, the new method does not require additional function queries, and the gradient estimation error can decrease exponentially. 2. **Convergence analysis**: - Proves the convergence of the FZooS algorithm under different assumptions. - Provides theoretical bounds on the number of random features M and the number of communication rounds R required to reach ε - convergence error in the cases of strongly convex and convex functions. ### Experimental verification The paper verifies the superior performance of the FZooS algorithm in terms of query efficiency and communication efficiency through synthetic experiments, federated black - box adversarial attacks, and federated non - differentiable metric optimization experiments. ### Summary By introducing the gradient proxy based on trajectory information and the adaptive gradient correction technique, this paper effectively solves the query efficiency and communication efficiency problems in federated zeroth - order optimization, providing a more efficient and accurate solution for practical applications.