Abstract:Efficient neural networks has received ever-increasing attention with the evolution of convolutional neural networks (CNNs), especially involving their deployment on embedded and mobile platforms. One of the biggest problems to obtaining such efficient neural networks is efficiency, even recent differentiable neural architecture search (DNAS) requires to sample a small number of candidate neural architectures for the selection of the optimal neural architecture. To address this computational efficiency issue, we introduce a novel architecture parameterization based on scaled sigmoid function, and propose a general Differentiable Neural Architecture Learning (DNAL) method to obtain efficient neural networks without the need to evaluate candidate neural networks. Specifically, for stochastic supernets as well as conventional CNNs, we build a new channel-wise module layer with the architecture components controlled by a scaled sigmoid function. We train these neural network models from scratch. The network optimization is decoupled into the weight optimization and the architecture optimization, which avoids the interaction between the two types of parameters and alleviates the vanishing gradient problem. We address the non-convex optimization problem of efficient neural networks by the continuous scaled sigmoid method instead of the common softmax method. Extensive experiments demonstrate our DNAL method delivers superior performance in terms of efficiency, and adapts to conventional CNNs (e.g., VGG16 and ResNet50), lightweight CNNs (e.g., MobileNetV2) and stochastic supernets (e.g., ProxylessNAS). The optimal neural networks learned by DNAL surpass those produced by the state-of-the-art methods on the benchmark CIFAR-10 and ImageNet-1K dataset in accuracy, model size and computational complexity. Our source code is available at https://github.com/QingbeiGuo/DNAL.git.

Differentiable homotopy methods for gradually reinforcing the training of fully connected neural networks

Dynamic node creation and fast learning algorithm for a hybrid feedforward neural network

HCFNN: High-order Coverage Function Neural Network for Image Classification

A new differentiable architecture search method for optimizing convolutional neural networks in the digital twin of intelligent robotic grasping

HoD-Net: High-Order Differentiable Deep Neural Networks and Applications.

Dynamic Hierarchical Mimicking Towards Consistent Optimization Objectives

Efficient Incremental Training for Deep Convolutional Neural Networks

Gradient rectified parameter unit of the fully connected layer in convolutional neural networks

EA-CG: An Approximate Second-Order Method for Training Fully-Connected Neural Networks

Differentiable Forward and Backward Fixed-Point Iteration Layers

Differentiable neural architecture learning for efficient neural networks

Homotopy Relaxation Training Algorithms for Infinite-Width Two-Layer ReLU Neural Networks

Variable three-term conjugate gradient method for training artificial neural networks

A Convergent ADMM Framework for Efficient Neural Network Training

Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations

Enhancing Convolutional Neural Networks with Higher-Order Numerical Difference Methods

Towards optimal hierarchical training of neural networks

Disentangling Trainability and Generalization in Deep Neural Networks

DeepHGCN: Toward Deeper Hyperbolic Graph Convolutional Networks

Training Interpretable Convolutional Neural Networks by Differentiating Class-specific Filters

Decouple Graph Neural Networks: Train Multiple Simple GNNs Simultaneously Instead of One