IntentTuner: An Interactive Framework for Integrating Human Intents in Fine-tuning Text-to-Image Generative Models

Xingchen Zeng,Ziyao Gao,Yilin Ye,Wei Zeng
2024-01-28
Abstract:Fine-tuning facilitates the adaptation of text-to-image generative models to novel concepts (e.g., styles and portraits), empowering users to forge creatively customized content. Recent efforts on fine-tuning focus on reducing training data and lightening computation overload but neglect alignment with user intentions, particularly in manual curation of multi-modal training data and intent-oriented evaluation. Informed by a formative study with fine-tuning practitioners for comprehending user intentions, we propose IntentTuner, an interactive framework that intelligently incorporates human intentions throughout each phase of the fine-tuning workflow. IntentTuner enables users to articulate training intentions with imagery exemplars and textual descriptions, automatically converting them into effective data augmentation strategies. Furthermore, IntentTuner introduces novel metrics to measure user intent alignment, allowing intent-aware monitoring and evaluation of model training. Application exemplars and user studies demonstrate that IntentTuner streamlines fine-tuning, reducing cognitive effort and yielding superior models compared to the common baseline tool.
Human-Computer Interaction
What problem does this paper attempt to address?
The paper aims to address the issue of user intent alignment in the fine-tuning process of text-to-image generation models. Specifically, the researchers found that existing fine-tuning methods and techniques, while effectively reducing the number of images and computational resources required for training, neglect the alignment between user intent and technical implementation, especially in the manual curation of multimodal training data and intent-based evaluation. To solve this problem, the research team proposed an interactive framework called IntentTuner. The core goal of this framework is to better integrate user intent through the following three main aspects: 1. **Understanding User Intent**: Understanding user intent through natural language descriptions and interactive methods. 2. **Efficiently Translating User Intent into Data Strategies**: Automatically converting user intent into matching data augmentation strategies. 3. **Monitoring and Evaluating Intent Alignment**: Introducing new metrics to measure the degree of user intent alignment and allowing intent-aware monitoring and evaluation during model training. The design of IntentTuner is based on preliminary research on fine-tuning practitioners, aiming to help users clearly express their training intent (e.g., through example images and text descriptions) and automatically convert it into effective data augmentation strategies. Additionally, the framework introduces a novel method to measure the alignment of user intent, enabling monitoring and evaluation of model training. The paper also details the specific challenges users face in fine-tuning practice, including the difficulty of translating abstract intent into clear data strategies, the lack of effective model selection and evaluation methods, and the lack of intuitive training monitoring tools. To address these challenges, IntentTuner provides an integrated system that unifies the fine-tuning and generation process, allowing both expert and novice users to flexibly customize text-to-image generation models according to their intent, and supports user-friendly monitoring and evaluation functions for intuitive model selection.