Abstract:MLaaS Service Providers (SPs) holding a Neural Network would like to keep the Neural Network weights secret. On the other hand, users wish to utilize the SPs' Neural Network for inference without revealing their data. Multi-Party Computation (MPC) offers a solution to achieve this. Computations in MPC involve communication, as the parties send data back and forth. Non-linear operations are usually the main bottleneck requiring the bulk of communication bandwidth. In this paper, we focus on ResNets, which serve as the backbone for many Computer Vision tasks, and we aim to reduce their non-linear components, specifically, the number of ReLUs. Our key insight is that spatially close pixels exhibit correlated ReLU responses. Building on this insight, we replace the per-pixel ReLU operation with a ReLU operation per patch. We term this approach 'Block-ReLU'. Since different layers in a Neural Network correspond to different feature hierarchies, it makes sense to allow patch-size flexibility for the various layers of the Neural Network. We devise an algorithm to choose the optimal set of patch sizes through a novel reduction of the problem to the Knapsack Problem. We demonstrate our approach in the semi-honest secure 3-party setting for four problems: Classifying ImageNet using ResNet50 backbone, classifying CIFAR100 using ResNet18 backbone, Semantic Segmentation of ADE20K using MobileNetV2 backbone, and Semantic Segmentation of Pascal VOC 2012 using ResNet50 backbone. Our approach achieves competitive performance compared to a handful of competitors. Our source code is publicly available: <a class="link-external link-https" href="https://github.com/yg320/secure_inference" rel="external noopener nofollow">this https URL</a>.

StreamliNet: Cost-aware Layer-wise Neural Network Linearization for Fast and Accurate Private Inference

Deep Neural Network Acceleration with Sparse Prediction Layers

Linearizing Models for Efficient yet Robust Private Inference

AutoReP: Automatic ReLU Replacement for Fast Private Network Inference

DeepReShape: Redesigning Neural Networks for Efficient Private Inference

Sphynx: ReLU-Efficient Network Design for Private Inference

Making Models Shallow Again: Jointly Learning to Reduce Non-Linearity and Depth for Latency-Efficient Private Inference

FastSecNet: An Efficient Cryptographic Framework for Private Neural Network Inference

DReP: Deep ReLU pruning for fast private inference

CrossNet: A Low-Latency MLaaS Framework for Privacy-Preserving Neural Network Inference on Resource-Limited Devices

Disparate Impact on Group Accuracy of Linearization for Private Inference

AERO: Softmax-Only LLMs for Efficient Private Inference

C2PI: An Efficient Crypto-Clear Two-Party Neural Network Private Inference

Securing Neural Networks with Knapsack Optimization

xMLP: Revolutionizing Private Inference with Exclusive Square Activation

PP-Stream: Toward High-Performance Privacy-Preserving Neural Network Inference via Distributed Stream Processing.

Don't Think It Twice: Exploit Shift Invariance for Efficient Online Streaming Inference of CNNs

Reducing ReLU Count for Privacy-Preserving CNN Speedup

B-LNN: Inference-time linear model for secure neural network inference

Compressing Neural Networks Using Learnable 1-D Non-Linear Functions

PrivCirNet: Efficient Private Inference via Block Circulant Transformation