PID Controller-Based Stochastic Optimization Acceleration for Deep Neural Networks

Haoqian Wang,Yi Luo,Wangpeng An,Qingyun Sun,Jun Xu,Lei Zhang
DOI: https://doi.org/10.1109/tnnls.2019.2963066
IF: 14.255
2020-12-01
IEEE Transactions on Neural Networks and Learning Systems
Abstract:Deep neural networks (DNNs) are widely used and demonstrated their power in many applications, such as computer vision and pattern recognition. However, the training of these networks can be time consuming. Such a problem could be alleviated by using efficient optimizers. As one of the most commonly used optimizers, stochastic gradient descent-momentum (SGD-M) uses past and present gradients for parameter updates. However, in the process of network training, SGD-M may encounter some drawbacks, such as the overshoot phenomenon. This problem would slow the training convergence. To alleviate this problem and accelerate the convergence of DNN optimization, we propose a proportional-integral-derivative (PID) approach. Specifically, we investigate the intrinsic relationships between the PID-based controller and SGD-M first. We further propose a PID-based optimization algorithm to update the network parameters, where the past, current, and change of gradients are exploited. Consequently, our proposed PID-based optimization alleviates the overshoot problem suffered by SGD-M. When tested on popular DNN architectures, it also obtains up to 50% acceleration with competitive accuracy. Extensive experiments about computer vision and natural language processing demonstrate the effectiveness of our method on benchmark data sets, including CIFAR10, CIFAR100, Tiny-ImageNet, and PTB. We have released the code at https://github.com/tensorboy/PIDOptimizer.
computer science, artificial intelligence, theory & methods,engineering, electrical & electronic, hardware & architecture
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address the issue of feature selection instability in clinical prediction models using Electronic Medical Records (EMR) data. Specifically: 1. **Problem Background**: - Feature selection in high-dimensional EMR data tends to be unstable when faced with data resampling. - Automatic feature selection algorithms can cause significant fluctuations in feature weights when handling high-dimensional data, thereby affecting the stability and interpretability of the model. 2. **Research Objectives**: - Propose a method based on Feature Graph, utilizing the inherent structure in EMR data (such as temporal and hierarchical relationships) to enhance feature stability in linear models (e.g., logistic regression). - Validate the effectiveness and stability of this method through experiments predicting the readmission of heart disease patients within 6 months. 3. **Main Contributions**: - Introduced a novel approach by incorporating a Laplacian regularization term into the Lasso regression model, using the Feature Graph to stabilize feature selection. - Validated the effectiveness of this method on real clinical datasets, demonstrating its superiority over traditional methods in terms of feature stability and model fit.