GenLens: A Systematic Evaluation of Visual GenAI Model Outputs

Tica Lin, Hanspeter Pfister, Jui-Hsien Wang

2024-02-06

Abstract:The rapid development of generative AI (GenAI) models in computer vision necessitates effective evaluation methods to ensure their quality and fairness. Existing tools primarily focus on dataset quality assurance and model explainability, leaving a significant gap in GenAI output evaluation during model development. Current practices often depend on developers' subjective visual assessments, which may lack scalability and generalizability. This paper bridges this gap by conducting a formative study with GenAI model developers in an industrial setting. Our findings led to the development of GenLens, a visual analytic interface designed for the systematic evaluation of GenAI model outputs during the early stages of model development. GenLens offers a quantifiable approach for overviewing and annotating failure cases, customizing issue tags and classifications, and aggregating annotations from multiple users to enhance collaboration. A user study with model developers reveals that GenLens effectively enhances their workflow, evidenced by high satisfaction rates and a strong intent to integrate it into their practices. This research underscores the importance of robust early-stage evaluation tools in GenAI development, contributing to the advancement of fair and high-quality GenAI models.

Artificial Intelligence,Human-Computer Interaction

What problem does this paper attempt to address?

The paper primarily focuses on the evaluation problem of rapidly developing generated artificial intelligence (GenAI) models in the field of computer vision. Current tools mainly focus on data quality and model interpretability, but there is a significant gap in the evaluation of output during the early stages of GenAI model development. Developers often rely on subjective visual evaluation, which may lack scalability and universality. GenLens is an interactive web application designed to facilitate annotation and analysis of GenAI model outputs, providing a comprehensive evaluation pipeline that includes pattern discovery, annotation of issues, analysis of model performance through aggregated results, and deriving insights through quantifiable evidence to optimize model training. This tool supports users in quantitatively viewing and annotating failed cases, customizing problem labels and categories, and aggregating annotations from multiple users for enhanced collaboration. The paper conducted a formal study, collaborating with GenAI model developers in industrial environments, and found that GenLens effectively improved their workflow, as evidenced by high satisfaction and strong intention to integrate it into practice. GenLens emphasizes the importance of having powerful evaluation tools in the early stages of GenAI development, contributing to the development of fair and high-quality GenAI models.

GenLens: A Systematic Evaluation of Visual GenAI Model Outputs

GenLens: A Systematic Evaluation of Visual GenAI Model Outputs

GenAI Arena: An Open Evaluation Platform for Generative Models

GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment

Evaluating Generative AI-Enhanced Content: A Conceptual Framework Using Qualitative, Quantitative, and Mixed-Methods Approaches

Interactive Visual Assessment for Text-to-Image Generation Models

Concept Lens: Visually Analyzing the Consistency of Semantic Manipulation in GANs

A Novel Evaluation Framework for Image2Text Generation

GenAssist: Making Image Generation Accessible

The Metacognitive Demands and Opportunities of Generative AI

Evaluating Text-to-Image Generative Models: An Empirical Study on Human Image Synthesis

The Power of Generative AI: A Review of Requirements, Models, Input–Output Formats, Evaluation Metrics, and Challenges

A Multi-Faceted Evaluation Framework for Assessing Synthetic Data Generated by Large Language Models

Model-based Maintenance and Evolution with GenAI: A Look into the Future

Enformation Theory: A Framework for Evaluating Genomic AI

Quality Prediction of AI Generated Images and Videos: Emerging Trends and Opportunities

Grounded Intuition of GPT-Vision's Abilities with Scientific Images

Generalized Visual Quality Assessment of GAN-Generated Face Images

Cutting Through the Confusion and Hype: Understanding the True Potential of Generative AI

ExploreGen: Large Language Models for Envisioning the Uses and Risks of AI Technologies

Generative AI for Self-Adaptive Systems: State of the Art and Research Roadmap