The D-Subspace Algorithm for Online Learning over Distributed Networks

Yitong Chen,Danqi Jin,Jie Chen
2024-10-26
Abstract:This material introduces the D-Subspace algorithm derived on the basis of the centralized algorithm [1], which originally addresses parameter estimation problems under a subspace constraint.
Signal Processing
What problem does this paper attempt to address?
This paper attempts to solve the problem of online learning in distributed networks, especially in the case where the low - rank subspace constraint is introduced in the parameter estimation problem. Specifically, the goal of the paper is to develop an algorithm named "D - Subspace" for efficient parameter estimation and optimization in a distributed network environment. ### Problem Background The paper assumes that in a connected network consisting of \( N \) nodes, each node \( k\in\mathcal{N} \) has a strongly convex, real - valued and differentiable cost function \( J_k(w_k) \), which corresponds to the expected value of a loss function \( G_k(w_k; s_k, n) \): \[ J_k(w_k)\triangleq\mathbb{E}\{G_k(w_k; s_k, n)\} \] where \( \mathbb{E}\{\cdot\} \) represents the expectation with respect to the distribution of random data \( s_k, n \), and the subscripts \( k \) and \( n \) represent the node index and time instance respectively. The true parameter vector \( w_k^*\in\mathbb{R}^L \) of each node \( k \) is the unique minimum solution of \( J_k(w_k) \). Define the matrix \( W^*\) as follows: \[ W^*\triangleq[w_1^*, w_2^*, \cdots, w_N^*]\in\mathbb{R}^{L\times N} \] ### Low - Rank Assumption The paper assumes that \( W^*\) is a low - rank matrix with rank \( r^*\), that is: \[ w_k^*=\sum_{i = 1}^{r^*}\alpha_{k,i}^o c_i = C\cdot\alpha_k^o \] where \( \{c_i\}_{i = 1}^{r^*} \) is a set of basis vectors, \( \{\alpha_{k,i}^o\}_{i = 1}^{r^*} \) are the corresponding weights, the matrix \( C\triangleq[c_1, c_2, \cdots, c_{r^*}]\in\mathbb{R}^{L\times r^*} \), and the vector \( \alpha_k^o\triangleq[\alpha_{k,1}^o, \alpha_{k,2}^o, \cdots, \alpha_{k,r^*}^o]^{\top} \). Assume that \( \alpha_k^o \) is known. Substituting the above expression into \( W^*\), we get: \[ W^* = C\cdot\Theta^o \] where the matrix \( \Theta^o\triangleq[\alpha_1^o, \alpha_2^o, \cdots, \alpha_N^o]\in\mathbb{R}^{r^* \times N} \) is also known. ### Centralized Optimization Problem Based on the above assumptions, the centralized optimization problem can be expressed as: \[ \arg\min_{w_{\ell}:\ell\in\mathcal{N}}\sum_{\ell = 1}^N J_{\ell}(w_{\ell}) \] \[ \text{s.t. }[W^{\top}](:, j)\in\mathcal{R}([\Theta^o]^{\top}),\quad\forall j \] where \( W\triangleq[w_{\ell}]_{\ell\in\mathcal{N}} \) is the estimated value of \( W^*\), and \( \mathcal{R}(\cdot) \) represents the range space operator. ### Distributed Optimization Problem Since the network is connected and only local data exchange is allowed in distributed processing, for each node \(