Abstract:A novel gradient boosting framework is proposed where shallow neural networks are employed as ``weak learners''. General loss functions are considered under this unified framework with specific examples presented for classification, regression, and learning to rank. A fully corrective step is incorporated to remedy the pitfall of greedy function approximation of classic gradient boosting decision tree. The proposed model rendered outperforming results against state-of-the-art boosting methods in all three tasks on multiple datasets. An ablation study is performed to shed light on the effect of each model components and model hyperparameters.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: to develop a new gradient - boosting framework, using shallow neural networks as "weak learners" to overcome the complexity and difficulties in the design and training of traditional deep neural networks (DNNs). Specifically, the authors introduce a new model called GrowNet, which aims to combine the powerful functions of gradient - boosting with the flexibility and versatility of neural networks, thereby building complex deep neural networks layer by layer. ### Main Problems and Solutions 1. **Complex Design and Training of Traditional Deep Neural Networks** - **Problem**: It is very difficult to customize deep neural networks for specific application areas, which requires a great deal of expertise and luck. The lack of a general design paradigm makes practitioners often rely on heuristic methods or ad - hoc solutions. - **Solution**: By introducing the idea of gradient - boosting, build neural networks layer by layer, so that the model can gradually increase in complexity while maintaining simplicity and controllability at each step. 2. **Limitations of Traditional Gradient - Boosting Decision Trees (GBDT)** - **Problem**: Although GBDT performs well in many tasks, decision trees are not suitable for all fields. Especially in tasks involving structured data, deep neural networks usually perform better. - **Solution**: Use shallow neural networks as weak learners instead of traditional decision trees, thus combining the expressive power of neural networks and the incremental learning advantages of gradient - boosting. 3. **Limitations of Greedy Function Approximation** - **Problem**: The classical gradient - boosting method uses a greedy strategy for function approximation, which may lead to local optimal solutions. - **Solution**: Introduce a global corrective step (Corrective Step), which allows updating all previous weak learner parameters in each iteration, thereby avoiding getting trapped in local optimal solutions and improving the overall performance of the model. ### Specific Contributions - **Propose a Novel Method**: Combine gradient - boosting with deep neural networks to build complex deep neural networks layer by layer. - **Develop an Optimization Algorithm**: Faster and easier to train than traditional deep neural networks, including introducing second - order statistical information and global corrective steps to improve stability and task - specific fine - tuning. - **Demonstrate the Effectiveness of the Method**: Through experimental evaluation, achieve results superior to the existing state - of - the - art methods in classification, regression, and ranking tasks on multiple real - data sets. ### Summary The main objective of this paper is to provide a more flexible and efficient method for building deep neural networks by introducing the combination of gradient - boosting and shallow neural networks, in order to address the complexity and limitations in the design and training of traditional deep neural networks.

Gradient Boosting Neural Networks: GrowNet

UniGrad-FS: Unified Gradient Projection with Flatter Sharpness for Continual Learning

Gradient Networks

BNGBS: An efficient network boosting system with triple incremental learning capabilities for more nodes, samples, and classes

A Deep Gradient Boosting Network for Optic Disc and Cup Segmentation

Gradient Correction Beyond Gradient Descent

Learning Gradient Descent: Better Generalization and Longer Horizons

Enhanced Gradient Learning for Deep Neural Networks

Gradient-Boosted Based Structured and Unstructured Learning

Boosted Dynamic Neural Networks

XGrad: Boosting Gradient-Based Optimizers With Weight Prediction

A Gradient Boosting Approach for Training Convolutional and Deep Neural Networks

RoNGBa: A Robustly Optimized Natural Gradient Boosting Training Approach with Leaf Number Clipping

Gradient and Newton boosting for classification and regression

Sequential Training of Neural Networks with Gradient Boosting

Provable Guarantees for Neural Networks via Gradient Feature Learning

Gradient Adversarial Training of Neural Networks

BGADAM: Boosting based Genetic-Evolutionary ADAM for Neural Network Optimization

A Gradient-Guided Evolutionary Approach to Training Deep Neural Networks

Intelligent gradient amplification for deep neural networks

Gradient Descent: The Ultimate Optimizer