Julia Nakhleh,Joseph Shenouda,Robert D. Nowak
Abstract:This paper studies the properties of solutions to multi-task shallow ReLU neural network learning problems, wherein the network is trained to fit a dataset with minimal sum of squared weights. Remarkably, the solutions learned for each individual task resemble those obtained by solving a kernel method, revealing a novel connection between neural networks and kernel methods. It is known that single-task neural network training problems are equivalent to minimum norm interpolation problem in a non-Hilbertian Banach space, and that the solutions of such problems are generally non-unique. In contrast, we prove that the solutions to univariate-input, multi-task neural network interpolation problems are almost always unique, and coincide with the solution to a minimum-norm interpolation problem in a Sobolev (Reproducing Kernel) Hilbert Space. We also demonstrate a similar phenomenon in the multivariate-input case; specifically, we show that neural network learning problems with large numbers of diverse tasks are approximately equivalent to an $\ell^2$ (Hilbert space) minimization problem over a fixed kernel determined by the optimal neurons.
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is the impact of multi - task learning on the function characteristics of ReLU neural networks, especially how the solutions of each task differ from those of single - task learning when the network is trained to minimize the sum of squared weights. Specifically, the paper explores the following points:
1. **Uniqueness of multi - task learning solutions**: In the case of univariate input ($d = 1$), the paper proves that for different tasks, the solutions of multi - task learning are almost always unique, and gives the special - case conditions for non - unique solutions.
2. **Equivalence between multi - task learning and kernel methods**: When the solution for univariate input is unique, it can be interpolated by connecting data points, which is equivalent to the minimum - norm interpolation problem in the Sobolev space $H^1$. This means that the solution of each task is equivalent to the solution of the kernel method, while the solution of single - task learning is usually not unique and is the minimum - norm interpolation in the non - Hilbertian Banach space $BV^2$.
3. **Insights into multivariate multi - task problems**: The paper provides experimental evidence and mathematical analysis, indicating that similar conclusions also apply to multivariate settings. Specifically, when the number of tasks is large and diverse, the solution of each task is approximately the minimum - norm solution in a specific RKHS space.
### Formula Summary
- The form of the ReLU neural network function:
\[
f_\theta(x)=\sum_{k = 1}^{K}v_k(w_k^{\top}x + b_k)_++Ax + c
\]
where $(\cdot)_+=\max\{0,\cdot\}$, $w_k\in\mathbb{R}^d$, $v_k\in\mathbb{R}^T$, $b_k\in\mathbb{R}$, $A\in\mathbb{R}^{T\times d}$, $c\in\mathbb{R}^T$.
- The weight - decay interpolation problem:
\[
\min_{\theta}\sum_{k = 1}^{K}\|v_k\|_2^2+\|w_k\|_2^2\quad\text{subject to}\quad f_\theta(x_i)=y_i,\quad i = 1,\ldots,N
\]
- The equivalent optimization problem:
\[
\min_{\theta}\sum_{k = 1}^{K}\|v_k\|_2\quad\text{subject to}\quad\|w_k\|_2 = 1,\quad f_\theta(x_i)=y_i,\quad i = 1,\ldots,N
\]
- The slope of the interpolation function of connecting points:
\[
s_i^t=\frac{y_{i + 1,t}-y_{i,t}}{x_{i+1}-x_i}
\]
- Uniqueness conditions:
For some $i = 2,\ldots,N - 2$, the vectors
\[
s_i - s_{i-1}=\frac{y_{i+1}-y_i}{x_{i+1}-x_i}-\frac{y_i - y_{i-1}}{x_i - x_{i-1}}
\]
and
\[
s_{i+1}-s_i=\frac{y_{i+2}-y_{i+1}}{x_{i+2}-x_{i+1}}-\frac{y_{i+1}-y_i}{x_{i+1}-x_i}
\]
are both non - zero and aligned.
Through these studies, the paper reveals the unique impact of multi - task learning on neural network solutions and establishes a connection with traditional kernel methods.