CREAM: Comparison-Based Reference-Free ELO-Ranked Automatic Evaluation for Meeting Summarization

Ziwei Gong,Lin Ai,Harshsaiprasad Deshpande,Alexander Johnson,Emmy Phung,Zehui Wu,Ahmad Emami,Julia Hirschberg
2024-09-17
Abstract:Large Language Models (LLMs) have spurred interest in automatic evaluation methods for summarization, offering a faster, more cost-effective alternative to human evaluation. However, existing methods often fall short when applied to complex tasks like long-context summarizations and dialogue-based meeting summarizations. In this paper, we introduce CREAM (Comparison-Based Reference-Free Elo-Ranked Automatic Evaluation for Meeting Summarization), a novel framework that addresses the unique challenges of evaluating meeting summaries. CREAM leverages a combination of chain-of-thought reasoning and key facts alignment to assess conciseness and completeness of model-generated summaries without requiring reference. By employing an ELO ranking system, our approach provides a robust mechanism for comparing the quality of different models or prompt configurations.
Computation and Language
What problem does this paper attempt to address?