Abstract:PurposeDeep convolutional neural networks (DCNN) are currently ubiquitous in medical imaging. While their versatility and high-quality results for common image analysis tasks including segmentation, localisation and prediction is astonishing, the large representational power comes at the cost of highly demanding computational effort. This limits their practical applications for image-guided interventions and diagnostic (point-of-care) support using mobile devices without graphics processing units (GPU).MethodsWe propose a new scheme that approximates both trainable weights and neural activations in deep networks by ternary values and tackles the open question of backpropagation when dealing with non-differentiable functions. Our solution enables the removal of the expensive floating-point matrix multiplications throughout any convolutional neural network and replaces them by energy- and time-preserving binary operators and population counts.ResultsWe evaluate our approach for the segmentation of the pancreas in CT. Here, our ternary approximation within a fully convolutional network leads to more than 90% memory reductions and high accuracy (without any post-processing) with a Dice overlap of 71.0% that comes close to the one obtained when using networks with high-precision weights and activations. We further provide a concept for sub-second inference without GPUs and demonstrate significant improvements in comparison with binary quantisation and without our proposed ternary hyperbolic tangent continuation.ConclusionsWe present a key enabling technique for highly efficient DCNN inference without GPUs that will help to bring the advances of deep learning to practical clinical applications. It has also great promise for improving accuracies in large-scale medical data retrieval.

FATNN: Fast and Accurate Ternary Neural Networks

Ternary Weight Networks

Propagating Asymptotic-Estimated Gradients for Low Bitwidth Quantized Neural Networks

Simultaneously Optimizing Weight and Quantizer of Ternary Neural Network using Truncated Gaussian Approximation

HitNet: Hybrid Ternary Recurrent Neural Network

QTTNet: Quantized Tensor Train Neural Networks for 3D Object and Video Recognition.

Twin Network Augmentation: A Novel Training Strategy for Improved Spiking Neural Networks and Efficient Weight Quantization

Learning Sparse & Ternary Neural Networks with Entropy-Constrained Trained Ternarization (EC2T)

TernaryNet: faster deep model inference without GPUs for medical 3D segmentation using sparse and binary convolutions

Pruning Ternary Quantization

Ternary Quantization: A Survey

Efficient Ternary Weight Embedding Model: Bridging Scalability and Performance

An In-Memory-Computing Charge-Domain Ternary CNN Classifier

Sub-bit Neural Networks: Learning to Compress and Accelerate Binary Neural Networks

Tensorial Neural Networks: Generalization of Neural Networks and Application to Model Compression

DeepNFT: Towards Precise Neurofibrillary Tangle Detection Via Improving Multi-scale Feature Fusion and Adversary

An Efficient Kernel Transformation Architecture for Binary- and Ternary-Weight Neural Network Inference.

Unsupervised Network Quantization via Fixed-Point Factorization

Adaptive Binary-Ternary Quantization

xTern: Energy-Efficient Ternary Neural Network Inference on RISC-V-Based Edge Systems

Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations