Abstract:To deploy machine learning models on-device, practitioners use compression algorithms to shrink and speed up models while maintaining their high-quality output. A critical aspect of compression in practice is model comparison, including tracking many compression experiments, identifying subtle changes in model behavior, and negotiating complex accuracy-efficiency trade-offs. However, existing compression tools poorly support comparison, leading to tedious and, sometimes, incomplete analyses spread across disjoint tools. To support real-world comparative workflows, we develop an interactive visual system called Compress and Compare. Within a single interface, Compress and Compare surfaces promising compression strategies by visualizing provenance relationships between compressed models and reveals compression-induced behavior changes by comparing models' predictions, weights, and activations. We demonstrate how Compress and Compare supports common compression analysis tasks through two case studies, debugging failed compression on generative language models and identifying compression artifacts in image classification models. We further evaluate Compress and Compare in a user study with eight compression experts, illustrating its potential to provide structure to compression workflows, help practitioners build intuition about compression, and encourage thorough analysis of compression's effect on model behavior. Through these evaluations, we identify compression-specific challenges that future visual analytics tools should consider and Compress and Compare visualizations that may generalize to broader model comparison tasks.

What problem does this paper attempt to address?

The paper aims to address several key challenges in the process of compressing machine learning models and proposes an interactive visualization system called "Compress and Compare" to help researchers analyze and compare the results of compression experiments. Specifically: 1. **Determining the optimal compression strategy is time-consuming and task-specific** (C1): The paper points out that there is no universal compression strategy that fits all situations. Even experienced experts need to conduct multiple experiments to find the best balance between efficiency and accuracy. Existing tools typically support the exploration of only single experimental results rather than evaluating the design space of all possible experiments. 2. **Compression requires trade-offs between multiple metrics** (C2): Researchers often need to make trade-off decisions between resource objectives such as memory, time, and accuracy. These budgets are usually set and adjusted based on how the model affects user experience. Therefore, tools are needed to help quickly evaluate different model variants and make multi-dimensional trade-off choices. 3. **Top-level metrics may obscure significant differences between compressed models** (C3): Although compressed models may perform similarly to the original models in overall performance, their behavior may change, introducing new errors or biases. However, these changes are not always reflected in evaluation metrics, thus requiring human involvement to uncover potential issues. 4. **Compression may have hard-to-debug impacts on the model's internals** (C4): Compression algorithms sometimes do not operate according to expected rules, leading to hard-to-track changes in the model's internal structure. To ensure that compression only affects the desired parts, researchers often need to identify errors and bottlenecks layer by layer. By developing the "Compress and Compare" system, the authors hope to address the above challenges, enabling users to track compression experiments, compare model performance and behavior, and gain insights into the impact of compression on the model's internals within a single interface.

Compress and Compare: Interactively Evaluating Efficiency and Behavior Across ML Model Compression Experiments

Safety and Performance, Why not Both? Bi-Objective Optimized Model Compression toward AI Software Deployment

VeriCompress: A Tool to Streamline the Synthesis of Verified Robust Compressed Neural Networks from Scratch

Understanding The Effectiveness of Lossy Compression in Machine Learning Training Sets

Compressive Visual Representations

The Cost of Compression: Investigating the Impact of Compression on Parametric Knowledge in Language Models

Comprehensive Study on Performance Evaluation and Optimization of Model Compression: Bridging Traditional Deep Learning and Large Language Models

AMC: AutoML for Model Compression and Acceleration on Mobile Devices

Safety and Performance, Why Not Both? Bi-Objective Optimized Model Compression against Heterogeneous Attacks Toward AI Software Deployment

LLMCBench: Benchmarking Large Language Model Compression for Efficient Deployment

Finding Deviated Behaviors of the Compressed DNN Models for Image Classifications

Pragmatic Image Compression for Human-in-the-Loop Decision-Making

Accuracy is Not All You Need

Hyper-Compression: Model Compression via Hyperfunction

AutoMC: Automated Model Compression based on Domain Knowledge and Progressive search strategy

Model Compression in Practice: Lessons Learned from Practitioners Creating On-device Machine Learning Experiences

Evaluating the Impact of Compression Techniques on Task-Specific Performance of Large Language Models

Beyond Perplexity: Multi-dimensional Safety Evaluation of LLM Compression

To Compress, or Not to Compress: Characterizing Deep Learning Model Compression for Embedded Inference

Kernel-wise difference minimization for convolutional neural network compression in metaverse