EdgeDiffuse: Optimizing Stable Diffusion Models

Project Overview

This project, developed in collaboration with Amazon, aimed to democratize AI image generation by enabling Stable Diffusion models to run efficiently on affordable edge devices (e.g., Orange Pi). The core challenge was to significantly reduce model size and inference latency while maintaining acceptable image quality.

Objectives

Democratize AI: Enable powerful generative models to run on consumer-grade edge hardware.
Efficiency: Optimize for limited memory and compute resources.
Quality Preservation: Ensure that compression techniques do not severely degrade the visual fidelity of generated images.

Technical Methods

We employed a multi-stage optimization pipeline focusing on three key techniques:

1. Mixed-Precision Quantization

We implemented 8-bit and 4-bit mixed-precision quantization. By selectively reducing the precision of weights and activations, we achieved significant memory savings.

Binning, Rounding, and Clipping: Fine-tuned the quantization parameters to minimize information loss.
Result: Reduced memory footprint by up to 4x.

2. Knowledge Distillation

We utilized knowledge distillation to train smaller "student" models that mimic the behavior of the larger "teacher" Stable Diffusion models. This allowed us to capture the generative capabilities of the full model in a much more compact architecture.

3. Weight Pruning

We applied pruning to remove less important neural connections (weights close to zero).

Strategy: Targeted a 10% reduction in weights.
Impact: Achieved over 25% model size reduction when combined with other techniques, with minimal impact on output quality.

Results & Achievements

Model Size Reduction: Achieved a 20-25% reduction in model size through our quantization and pruning strategies.
Edge Deployment: Successfully demonstrated the optimized model running on an Orange Pi utilizing both CPU (ARM) and NPU acceleration.
Performance: Validated that the optimized model could generate high-quality images within the constraints of edge hardware, bridging the gap between cloud-grade AI and edge computing.

Skills Applied

Machine Learning: Generative Models (Stable Diffusion), Model Compression.
Mathematics: Linear Algebra, Fixed Precision Computing.
MLOps: Fine-tuning, Edge Deployment, NPU Optimization.