Abstract:Contemporary state-of-the-art neural networks have increasingly large numbers of parameters, which prevents their deployment on devices with limited computational power. Pruning is one technique to remove unnecessary weights and reduce resource requirements for training and inference. In addition, for ML tasks where the input data is multi-dimensional, using higher-dimensional data embeddings such as complex numbers or quaternions has been shown to reduce the parameter count while maintaining accuracy. In this work, we conduct pruning on real and quaternion-valued implementations of different architectures on classification tasks. We find that for some architectures, at very high sparsity levels, quaternion models provide higher accuracies than their real counterparts. For example, at the task of image classification on CIFAR-10 using Conv-4, at $3\%$ of the number of parameters as the original model, the pruned quaternion version outperforms the pruned real by more than $10\%$. Experiments on various network architectures and datasets show that for deployment in extremely resource-constrained environments, a sparse quaternion network might be a better candidate than a real sparse model of similar architecture.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: How to deploy efficient neural network models on resource - constrained devices, especially maintaining high accuracy while significantly reducing the number of parameters. Specifically, the author explores methods to reduce neural network parameters by combining pruning techniques and quaternion representations, and evaluates their performance on different architectures and datasets. ### Problem Background Modern state - of - the - art neural networks usually have a large number of parameters, which makes them difficult to be deployed on devices with limited computing power (such as mobile phones). To solve this problem, researchers have proposed various methods for compressing models, such as pruning, low - rank factorization, and knowledge distillation. Among them, pruning is a technique that reduces the number of model parameters by removing redundant weights or neurons. In addition, using higher - dimensional data embeddings (such as complex numbers or quaternions) has also been proven to maintain the accuracy of the model while reducing the number of parameters. ### Core Problems of the Paper 1. **The Effect of Combining Pruning and Quaternions**: The author hopes to explore whether quaternion neural networks show higher accuracy than real - number neural networks under extreme sparsity (that is, a significant reduction in model parameters). 2. **Model Selection in Resource - Constrained Environments**: In an environment with very limited resources, which type of sparse model (real - number or quaternion) is more suitable for deployment? ### Experimental Design - **Datasets**: MNIST, CIFAR - 10, and CIFAR - 100. - **Model Architectures**: LeNet - 300 - 100, Conv - 2, Conv - 4, and Conv - 6. - **Methods**: Iteratively prune the models implemented with real numbers and quaternions, and retrain the pruned models to evaluate their performance. ### Main Findings 1. **Performance under High Sparsity**: Under extreme sparsity (about 10% or less of the parameters), quaternion models usually show higher accuracy than real - number models. For example, on the CIFAR - 10 dataset, when the model parameters are reduced to 3% of the original model, the pruned quaternion model has more than 10% higher accuracy than the pruned real - number model. 2. **Lottery Hypothesis Verification**: The pruned quaternion model can be retrained from scratch to reach an accuracy comparable to that of the unpruned model, verifying the effectiveness of the "Lottery Ticket Hypothesis" for quaternion models. 3. **Exceptions**: For some models (such as LeNet - 300 - 100), the real - number model outperforms the quaternion model at all sparsities, probably because these models are already very efficient and over - parameterized. ### Conclusions This research shows that in extremely sparse cases, quaternion neural networks can be an effective model compression method, especially suitable for multi - dimensional input tasks that need to run on resource - constrained devices. However, for some specific model architectures, the performance of quaternion models may be inferior to that of real - number models, and future research needs to further explore the reasons for this phenomenon. ### Formula Display A quaternion $q$ can be represented as: \[q = r+xi + yj+zk\] where $r, x, y, z\in\mathbb{R}$, and satisfy the following relations: \[i^{2}=j^{2}=k^{2}=ijk = - 1\] The Hamilton product of two quaternions $q_{1}$ and $q_{2}$ is: \[q_{1}\otimes q_{2}=(r_{1}r_{2}-x_{1}x_{2}-y_{1}y_{2}-z_{1}z_{2})+(r_{1}x_{2}+x_{1}r_{2}+y_{1}z_{2}-z_{1}y_{2})i+(r_{1}y_{2}-x_{1}z_{2}+y_{1}r_{2}+z_{1}x_{2})j+(r_{1}z_{2}+x_{1}y_{2}-y_{1}x_{2}-z_{1}r_{2})k\] Quaternions can be represented by 4×4 matrices for ease of calculation: \[q=\begin{bmatrix} r&-x&-y&-z\\ x&r&-z&y\\ y&z& \end{bmatrix}\]

Neural Networks at a Fraction with Pruned Quaternions

Single-shot Pruning and Quantization for Hardware-Friendly Neural Network Acceleration

Optimizing Quantized Neural Networks in a Weak Curvature Manifold

Class-Aware Pruning for Efficient Neural Networks

Hessian-based Mixed-Precision Quantization with Transition Aware Training for Neural Networks

Automatic Pruning for Quantized Neural Networks

Lightweight and Efficient Neural Natural Language Processing with Quaternion Networks

Bit Efficient Quantization for Deep Neural Networks

Pruning is Optimal for Learning Sparse Features in High-Dimensions

Quantisation and Pruning for Neural Network Compression and Regularisation

Training Deep Neural Networks with Joint Quantization and Pruning of Weights and Activations

Pse: Mixed Quantization Framework of Neural Networks for Efficient Deployment

Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations

Resource Efficient Neural Networks Using Hessian Based Pruning

Learning Low Resource Consumption CNN through Pruning and Quantization

Symmetric Pruning in Quantum Neural Networks

Accurate Neural Network Pruning Requires Rethinking Sparse Optimization

Subspace Node Pruning

Fast Quaternion Product Units for Learning Disentangled Representations in SO(3)

Joint Pruning and Channel-wise Mixed-Precision Quantization for Efficient Deep Neural Networks