Efficient and Flexible Method for Reducing Moderate-size Deep Neural Networks with Condensation

Tianyi Chen,Zhi-Qin John Xu

2024-07-01

Abstract:Neural networks have been extensively applied to a variety of tasks, achieving astounding results. Applying neural networks in the scientific field is an important research direction that is gaining increasing attention. In scientific applications, the scale of neural networks is generally moderate-size, mainly to ensure the speed of inference during application. Additionally, comparing neural networks to traditional algorithms in scientific applications is inevitable. These applications often require rapid computations, making the reduction of neural network sizes increasingly important. Existing work has found that the powerful capabilities of neural networks are primarily due to their non-linearity. Theoretical work has discovered that under strong non-linearity, neurons in the same layer tend to behave similarly, a phenomenon known as condensation. Condensation offers an opportunity to reduce the scale of neural networks to a smaller subnetwork with similar performance. In this article, we propose a condensation reduction algorithm to verify the feasibility of this idea in practical problems. Our reduction method can currently be applied to both fully connected networks and convolutional networks, achieving positive results. In complex combustion acceleration tasks, we reduced the size of the neural network to 41.7% of its original scale while maintaining prediction accuracy. In the CIFAR10 image classification task, we reduced the network size to 11.5% of the original scale, still maintaining a satisfactory validation accuracy. Our method can be applied to most trained neural networks, reducing computational pressure and improving inference speed.

Machine Learning

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to reduce the size of medium - scale deep neural networks in scientific applications while maintaining their performance. Specifically, the paper focuses on how to simplify the neural network structure through the condensation phenomenon, thereby reducing the number of parameters in the neural network and increasing the inference speed without significantly affecting the prediction accuracy, making it more suitable for deployment in resource - constrained environments such as embedded systems, mobile devices and sensors. In addition, the paper also verifies the existing theory that under strong non - linear conditions, neurons in the same layer tend to exhibit similar behavior, a phenomenon known as condensation, and the network size can be reduced by merging these similar neurons. The paper mentions that by proposing a network reduction method based on condensation, the authors successfully reduced the size of a neural network for the combustion simulation acceleration task to 41.7% of its original size, and at the same time, in the CIFAR10 image classification task, reduced the network size to 11.5% of its original size, while maintaining a high prediction accuracy. This shows that this method is not only applicable to fully - connected networks, but also applicable to convolutional neural networks, and has wide applicability and effectiveness in practical applications.

Efficient and Flexible Method for Reducing Moderate-size Deep Neural Networks with Condensation

Deep Neural Network Acceleration with Sparse Prediction Layers

DCCD: Reducing Neural Network Redundancy Via Distillation

A Model Compression Method Using Significant Data and Knowledge Distillation

Layer-Wise Training To Create Efficient Convolutional Neural Networks

Sensitivity-Oriented Layer-Wise Acceleration and Compression for Convolutional Neural Network.

Learning Efficient Convolutional Networks Through Network Slimming.

Self-Compressing Neural Networks

Convolution neural network compression method with scale factor

Deep Learning Model Compression with Rank Reduction in Tensor Decomposition.

TEC-CNN: Towards Efficient Compressing Convolutional Neural Nets with Low-rank Tensor Decomposition

A New Compression Method for Deep Neural Networks with Accuracy Improvement

A Unified Approximation Framework for Compressing and Accelerating Deep Neural Networks

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

A Highly Efficient Training-Aware Convolutional Neural Network Compression Paradigm

An Accuracy-Preserving Neural Network Compression Via Tucker Decomposition

Merging Similar Neurons for Deep Networks Compression

CMD: Controllable Matrix Decomposition with Global Optimization for Deep Neural Network Compression

TOWARDS MORE EFFICIENT AND EFFECTIVE INFERENCE: THE JOINT DECISION OF MULTI-PARTICIPANTS

Sensitivity-based Acceleration and Compression Algorithm for Convolution Neural Network.

Compressing Deep Networks by Neuron Agglomerative Clustering