Compress and Compare: Interactively Evaluating Efficiency and Behavior Across ML Model Compression Experiments

Angie Boggust,Venkatesh Sivaraman,Yannick Assogba,Donghao Ren,Dominik Moritz,Fred Hohman
2024-08-07
Abstract:To deploy machine learning models on-device, practitioners use compression algorithms to shrink and speed up models while maintaining their high-quality output. A critical aspect of compression in practice is model comparison, including tracking many compression experiments, identifying subtle changes in model behavior, and negotiating complex accuracy-efficiency trade-offs. However, existing compression tools poorly support comparison, leading to tedious and, sometimes, incomplete analyses spread across disjoint tools. To support real-world comparative workflows, we develop an interactive visual system called Compress and Compare. Within a single interface, Compress and Compare surfaces promising compression strategies by visualizing provenance relationships between compressed models and reveals compression-induced behavior changes by comparing models' predictions, weights, and activations. We demonstrate how Compress and Compare supports common compression analysis tasks through two case studies, debugging failed compression on generative language models and identifying compression artifacts in image classification models. We further evaluate Compress and Compare in a user study with eight compression experts, illustrating its potential to provide structure to compression workflows, help practitioners build intuition about compression, and encourage thorough analysis of compression's effect on model behavior. Through these evaluations, we identify compression-specific challenges that future visual analytics tools should consider and Compress and Compare visualizations that may generalize to broader model comparison tasks.
Human-Computer Interaction,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The paper aims to address several key challenges in the process of compressing machine learning models and proposes an interactive visualization system called "Compress and Compare" to help researchers analyze and compare the results of compression experiments. Specifically: 1. **Determining the optimal compression strategy is time-consuming and task-specific** (C1): The paper points out that there is no universal compression strategy that fits all situations. Even experienced experts need to conduct multiple experiments to find the best balance between efficiency and accuracy. Existing tools typically support the exploration of only single experimental results rather than evaluating the design space of all possible experiments. 2. **Compression requires trade-offs between multiple metrics** (C2): Researchers often need to make trade-off decisions between resource objectives such as memory, time, and accuracy. These budgets are usually set and adjusted based on how the model affects user experience. Therefore, tools are needed to help quickly evaluate different model variants and make multi-dimensional trade-off choices. 3. **Top-level metrics may obscure significant differences between compressed models** (C3): Although compressed models may perform similarly to the original models in overall performance, their behavior may change, introducing new errors or biases. However, these changes are not always reflected in evaluation metrics, thus requiring human involvement to uncover potential issues. 4. **Compression may have hard-to-debug impacts on the model's internals** (C4): Compression algorithms sometimes do not operate according to expected rules, leading to hard-to-track changes in the model's internal structure. To ensure that compression only affects the desired parts, researchers often need to identify errors and bottlenecks layer by layer. By developing the "Compress and Compare" system, the authors hope to address the above challenges, enabling users to track compression experiments, compare model performance and behavior, and gain insights into the impact of compression on the model's internals within a single interface.