Abstract:The rapid development of generative AI (GenAI) models in computer vision necessitates effective evaluation methods to ensure their quality and fairness. Existing tools primarily focus on dataset quality assurance and model explainability, leaving a significant gap in GenAI output evaluation during model development. Current practices often depend on developers' subjective visual assessments, which may lack scalability and generalizability. This paper bridges this gap by conducting a formative study with GenAI model developers in an industrial setting. Our findings led to the development of GenLens, a visual analytic interface designed for the systematic evaluation of GenAI model outputs during the early stages of model development. GenLens offers a quantifiable approach for overviewing and annotating failure cases, customizing issue tags and classifications, and aggregating annotations from multiple users to enhance collaboration. A user study with model developers reveals that GenLens effectively enhances their workflow, evidenced by high satisfaction rates and a strong intent to integrate it into their practices. This research underscores the importance of robust early-stage evaluation tools in GenAI development, contributing to the advancement of fair and high-quality GenAI models.

What problem does this paper attempt to address?

The paper focuses on the early assessment of Generative Artificial Intelligence (GenAI) models in the field of computer vision. Existing tools primarily focus on data quality assurance and model interpretability, while there is a gap in output evaluation during the development process of GenAI models. Developers often rely on subjective visual evaluation, which may lack scalability and generalizability. GenLens is an interactive web application designed to facilitate annotation and analysis of GenAI model outputs, supporting a comprehensive evaluation process from pattern discovery, issue labeling, and result aggregation to evidence-based insights. It provides an overview of failure cases and annotation methods, custom problem labeling and categorization, as well as collaborative features through multi-user annotation aggregation. User studies have shown that GenLens effectively improves the workflow of model developers, and they have a strong intention to integrate it into practice. Through formal research with GenAI model developers in industrial environments, the paper reveals the importance of systematic evaluation of GenAI models at the early stages of development. GenLens fills this gap by providing quantitative methods to summarize and annotate failure cases, thereby supporting improvement in model training. User studies confirm that GenLens enhances developers' work efficiency, increases their satisfaction, and helps gain better insights for validation. The paper also discusses related work, including generative AI, visualization analysis of machine learning, and challenges in evaluating GenAI model outputs. GenLens aims to achieve four key objectives: pattern discovery, issue identification, performance analysis, and insight summarization, to support effective evaluation of model outputs. Finally, the paper presents the design iteration process of GenLens, including key components such as discovery page, annotation modes, and analysis page, as well as its implementation and user feedback. User evaluations indicate that GenLens is highly useful in model evaluation, user-friendly, and users have a strong intention to use it. Furthermore, the research proposes two insights for GenAI model development: enhancing collaboration in the early model evaluation stage and promoting human-centric GenAI development. Future work may include further optimization of GenLens to accommodate larger-scale data and different tasks, as well as evaluating the application of model outputs for end users after deployment.

GenLens: A Systematic Evaluation of Visual GenAI Model Outputs

GenLens: A Systematic Evaluation of Visual GenAI Model Outputs

GenAI Arena: An Open Evaluation Platform for Generative Models

GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment

Interactive Visual Assessment for Text-to-Image Generation Models

Evaluating Generative AI-Enhanced Content: A Conceptual Framework Using Qualitative, Quantitative, and Mixed-Methods Approaches

Concept Lens: Visually Analyzing the Consistency of Semantic Manipulation in GANs

The Metacognitive Demands and Opportunities of Generative AI

GenAssist: Making Image Generation Accessible

A Novel Evaluation Framework for Image2Text Generation

A Multi-Faceted Evaluation Framework for Assessing Synthetic Data Generated by Large Language Models

ExploreGen: Large Language Models for Envisioning the Uses and Risks of AI Technologies

Grounded Intuition of GPT-Vision's Abilities with Scientific Images

MarkupLens: An AI-Powered Tool to Support Designers in Video-Based Analysis at Scale

GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation

Evaluating Text-to-Image Generative Models: An Empirical Study on Human Image Synthesis

Model-based Maintenance and Evolution with GenAI: A Look into the Future

Generative AI for Self-Adaptive Systems: State of the Art and Research Roadmap

Quality Prediction of AI Generated Images and Videos: Emerging Trends and Opportunities

Generative AI in the Wild: Prospects, Challenges, and Strategies

Evaluating the Social Impact of Generative AI Systems in Systems and Society