Hybrid SD: Edge-Cloud Collaborative Inference for Stable Diffusion Models

Chenqian Yan,Songwei Liu,Hongjian Liu,Xurui Peng,Xiaojian Wang,Fangmin Chen,Lean Fu,Xing Mei

2024-10-30

Abstract:Stable Diffusion Models (SDMs) have shown remarkable proficiency in image synthesis. However, their broad application is impeded by their large model sizes and intensive computational requirements, which typically require expensive cloud servers for deployment. On the flip side, while there are many compact models tailored for edge devices that can reduce these demands, they often compromise on semantic integrity and visual quality when compared to full-sized SDMs. To bridge this gap, we introduce Hybrid SD, an innovative, training-free SDMs inference framework designed for edge-cloud collaborative inference. Hybrid SD distributes the early steps of the diffusion process to the large models deployed on cloud servers, enhancing semantic planning. Furthermore, small efficient models deployed on edge devices can be integrated for refining visual details in the later stages. Acknowledging the diversity of edge devices with differing computational and storage capacities, we employ structural pruning to the SDMs U-Net and train a lightweight VAE. Empirical evaluations demonstrate that our compressed models achieve state-of-the-art parameter efficiency (225.8M) on edge devices with competitive image quality. Additionally, Hybrid SD reduces the cloud cost by 66% with edge-cloud collaborative inference.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem this paper attempts to address is: **The high computational resource demands and deployment costs of Stable Diffusion Models (SDMs) in practical applications.** Specifically, although SDMs perform excellently in image synthesis, their large model size and high computational demands limit their widespread application, often requiring expensive cloud servers for deployment. On the other hand, while some small models designed specifically for edge devices can reduce these demands, these small models often fall short in terms of semantic integrity and visual quality compared to full-sized SDMs. To bridge this gap, the authors propose an innovative, training-free SDMs inference framework called **Hybrid SD**, aimed at achieving edge-cloud collaborative inference. Hybrid SD enhances semantic planning by assigning the early steps of the diffusion process to large models deployed on cloud servers, while small, efficient models deployed on edge devices are used for visual detail refinement in the later stages. Additionally, considering the varying computational and storage capabilities of edge devices, the authors also performed structural pruning on the U-Net of SDMs and trained a lightweight VAE. In this way, Hybrid SD not only enables high-quality image generation on edge devices but also significantly reduces the inference cost on cloud servers. Experimental results show that Hybrid SD achieves state-of-the-art levels in parameter efficiency and image quality, and effectively reduces cloud costs.

Hybrid SD: Edge-Cloud Collaborative Inference for Stable Diffusion Models

Extendable Multi-Device Collaborative Pipeline Parallel Inference in the Edge-Cloud Scenario

Condense: A Framework for Device and Frequency Adaptive Neural Network Models on the Edge.

A-SDM: Accelerating Stable Diffusion through Redundancy Removal and Performance Optimization

BK-SDM: A Lightweight, Fast, and Cheap Version of Stable Diffusion

A-SDM: Accelerating Stable Diffusion through Model Assembly and Feature Inheritance Strategies

Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis

SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions

Multi-Compression Scale DNN Inference Acceleration based on Cloud-Edge-End Collaboration

Edge-SD-SR: Low Latency and Parameter Efficient On-device Super-Resolution with Stable Diffusion via Bidirectional Conditioning

Attention-aware Semantic Communications for Collaborative Inference

Hybrid SLM and LLM for Edge-Cloud Collaborative Inference

Dual-Model Distillation for Efficient Action Classification with Hybrid Edge-Cloud Solution

SpeedUpNet: A Plug-and-Play Hyper-Network for Accelerating Text-to-Image Diffusion Models

EdgeFusion: On-Device Text-to-Image Generation

Collaborative Inference for Deep Neural Networks in Edge Environments

Diffusion Probabilistic Model Made Slim

Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models

SNED: Superposition Network Architecture Search for Efficient Video Diffusion Model

ED-ViT: Splitting Vision Transformer for Distributed Inference on Edge Devices

Add-SD: Rational Generation without Manual Reference