Charles K. Chui,Shao-Bo Lin,Ding-Xuan Zhou
Abstract:The subject of deep learning has recently attracted users of machine learning from various disciplines, including: medical diagnosis and bioinformatics, financial market analysis and online advertisement, speech and handwriting recognition, computer vision and natural language processing, time series forecasting, and search engines. However, theoretical development of deep learning is still at its infancy. The objective of this paper is to introduce a deep neural network (also called deep-net) approach to localized manifold learning, with each hidden layer endowed with a specific learning task. For the purpose of illustrations, we only focus on deep-nets with three hidden layers, with the first layer for dimensionality reduction, the second layer for bias reduction, and the third layer for variance reduction. A feedback component also designed to eliminate outliers. The main theoretical result in this paper is the order $\mathcal O\left(m^{-2s/(2s+d)}\right)$ of approximation of the regression function with regularity $s$, in terms of the number $m$ of sample points, where the (unknown) manifold dimension $d$ replaces the dimension $D$ of the sampling (Euclidean) space for shallow nets.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to achieve localized deep learning by constructing a deep neural network with specific learning tasks (called deep network or DN). Specifically, the goal of the paper is to theoretically explore the advantages of this deep neural network when processing data located on an unknown manifold, especially in the approximation ability of regression functions. Compared with traditional shallow neural networks, the paper shows the superior performance of the deep network in reducing bias and variance, and proposes a feedback mechanism to eliminate outliers, thereby improving the learning effect.
### Main Contributions
1. **Theoretical Analysis**: The paper proposes a deep network structure with three hidden layers, each of which has a specific learning task. The first layer is used for dimension reduction, the second layer for bias reduction, and the third layer for variance reduction.
2. **Approximation Ability**: The paper proves that the constructed deep network can approximate the regression function with \( s \)-order smoothness with an accuracy of \( O(m^{-2s/(2s + d)}) \), where \( m \) is the number of sample points and \( d \) is the dimension of the unknown manifold.
3. **Feedback Mechanism**: A feedback mechanism is introduced to eliminate outliers in the learning process, further improving the robustness and performance of the model.
4. **Learning Rate Analysis**: Through theoretical analysis, it is proved that the deep network is superior to the shallow neural network in terms of learning rate, especially when dealing with high - dimensional data.
### Key Formulas
- **Approximation Accuracy**:
\[
E\left[\|N_3 - f_\rho\|^2_\rho\right] \leq C_1 m^{-2s/(2s + d)}
\]
where \( C_1 \) is a positive constant independent of \( m \).
- **Fine - tuned Learning Rate**:
\[
E\left[\|N_F^3 - f_\rho\|^2_\rho\right] \leq C_2' m^{-2s/(2s + d)}
\]
where \( C_2' \) is a positive constant independent of \( m \).
### Related Work
The paper also discusses other related research, such as object recognition, unsupervised training, and artificial intelligence architectures, etc. These studies all show the importance and advantages of deep learning. In particular, the paper compares the differences between shallow neural networks and deep neural networks in terms of approximation ability and learning rate, further verifying the superiority of the deep network in processing complex data sets.
### Conclusion
Through theoretical analysis and experimental verification, the paper proves that the deep network has significant advantages in processing high - dimensional data located on low - dimensional manifolds, especially in approximating regression functions and reducing bias and variance. These results provide important support for the theoretical basis of deep learning and also provide guidance for model design in practical applications.