Abstract:Meetings play a critical infrastructural role in the coordination of work. In recent years, due to shift to hybrid and remote work, more meetings are moving to online Computer Mediated Spaces. This has led to new problems (e.g. more time spent in less engaging meetings) and new opportunities (e.g. automated transcription/captioning and recap support). Recent advances in large language models (LLMs) for dialog summarization have the potential to improve the experience of meetings by reducing individuals' meeting load and increasing the clarity and alignment of meeting outputs. Despite this potential, they face technological limitation due to long transcripts and inability to capture diverse recap needs based on user's context. To address these gaps, we design, implement and evaluate in-context a meeting recap system. We first conceptualize two salient recap representations -- important highlights, and a structured, hierarchical minutes view. We develop a system to operationalize the representations with dialogue summarization as its building blocks. Finally, we evaluate the effectiveness of the system with seven users in the context of their work meetings. Our findings show promise in using LLM-based dialogue summarization for meeting recap and the need for both representations in different contexts. However, we find that LLM-based recap still lacks an understanding of whats personally relevant to participants, can miss important details, and mis-attributions can be detrimental to group dynamics. We identify collaboration opportunities such as a shared recap document that a high quality recap enables. We report on implications for designing AI systems to partner with users to learn and improve from natural interactions to overcome the limitations related to personal relevance and summarization quality.

CREAM: Comparison-Based Reference-Free ELO-Ranked Automatic Evaluation for Meeting Summarization

Evaluate Summarization in Fine-Granularity: Auto Evaluation with LLM

Can Large Language Models Serve as Evaluators for Code Summarization?

FineSurE: Fine-grained Summarization Evaluation using LLMs

Summaries, Highlights, and Action items: Design, implementation and evaluation of an LLM-powered meeting recap system

What's Wrong? Refining Meeting Summaries with LLM Feedback

UniSumEval: Towards Unified, Fine-Grained, Multi-Dimensional Summarization Evaluation for LLMs

Tell me what I need to know: Exploring LLM-based (Personalized) Abstractive Multi-Source Meeting Summarization

UMSE: Unified Multi-scenario Summarization Evaluation

A Comparative Study of Quality Evaluation Methods for Text Summarization

Benchmarking Large Language Models for News Summarization

SummScore: A Comprehensive Evaluation Metric for Summary Quality Based on Cross-Encoder

Towards Dataset-scale and Feature-oriented Evaluation of Text Summarization in Large Language Model Prompts

SummEval: Re-evaluating Summarization Evaluation

UserSumBench: A Benchmark Framework for Evaluating User Summarization Approaches

A Summary Evaluation Method Combining Linguistic Quality and Semantic Similarity

Investigating Consistency in Query-Based Meeting Summarization: A Comparative Study of Different Embedding Methods

Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation

Realizing Video Summarization from the Path of Language-based Semantic Understanding

Large Language Models are Not Yet Human-Level Evaluators for Abstractive Summarization

Rouge-C: A Fully Automated Evaluation Method for Multi-Document Summarization