DesignMinds: Enhancing Video-Based Design Ideation with Vision-Language Model and Context-Injected Large Language Model

Tianhao He,Andrija Stankovic,Evangelos Niforatos,Gerd Kortuem
2024-11-06
Abstract:Ideation is a critical component of video-based design (VBD), where videos serve as the primary medium for design exploration and inspiration. The emergence of generative AI offers considerable potential to enhance this process by streamlining video analysis and facilitating idea generation. In this paper, we present DesignMinds, a prototype that integrates a state-of-the-art Vision-Language Model (VLM) with a context-enhanced Large Language Model (LLM) to support ideation in VBD. To evaluate DesignMinds, we conducted a between-subject study with 35 design practitioners, comparing its performance to a baseline condition. Our results demonstrate that DesignMinds significantly enhances the flexibility and originality of ideation, while also increasing task engagement. Importantly, the introduction of this technology did not negatively impact user experience, technology acceptance, or usability.
Human-Computer Interaction
What problem does this paper attempt to address?
This paper attempts to address the challenges of creative generation in Video - Based Design (VBD), especially how to utilize generative AI technology to enhance the video - based design creative process. Specifically, the author points out that traditional methods have the following problems when dealing with video content to generate effective and efficient design ideas: 1. **High investment of time and effort**: Generating novel design ideas from videos requires a great deal of time and effort. 2. **Dependence on designers' experience**: This process is highly dependent on designers' design experience and expertise, which is especially challenging for novice designers. 3. **Risk of Design Fixation**: Over - relying on specific knowledge or one's own experience may lead to limited design results. To solve these problems, the paper proposes a prototype system named DesignMinds, which combines the state - of - the - art Vision - Language Model (VLM) and context - enhanced large - scale Language Model (LLM), aiming to improve the creative generation process in video - based design in an AI - assisted manner. Specific goals include: - Improving the flexibility and originality of creative generation. - Enhancing task engagement. - Ensuring that user experience, technology acceptance, and usability are not negatively affected. By using VLM for video understanding and combining LLM for creative recommendation in the design context, DesignMinds aims to help designers process video information more effectively and generate more diverse and high - quality design ideas. In addition, the research also explores the impact of this AI - assisted tool on user experience and technology acceptance, ensuring that it does not bring additional burdens or discomfort to users.