Abstract:The concept of an intelligent augmented reality (AR) assistant has significant, wide-ranging applications, with potential uses in medicine, military, and mechanics domains. Such an assistant must be able to perceive the environment and actions, reason about the environment state in relation to a given task, and seamlessly interact with the task performer. These interactions typically involve an AR headset equipped with sensors which capture video, audio, and haptic feedback. Previous works have sought to facilitate the development of intelligent AR assistants by visualizing these sensor data streams in conjunction with the assistant's perception and reasoning model outputs. However, existing visual analytics systems do not focus on user modeling or include biometric data, and are only capable of visualizing a single task session for a single performer at a time. Moreover, they typically assume a task involves linear progression from one step to the next. We propose a visual analytics system that allows users to compare performance during multiple task sessions, focusing on non-linear tasks where different step sequences can lead to success. In particular, we design visualizations for understanding user behavior through functional near-infrared spectroscopy (fNIRS) data as a proxy for perception, attention, and memory as well as corresponding motion data (acceleration, angular velocity, and gaze). We distill these insights into embedding representations that allow users to easily select groups of sessions with similar behaviors. We provide two case studies that demonstrate how to use these visualizations to gain insights about task performance using data collected during helicopter copilot training tasks. Finally, we evaluate our approach by conducting an in-depth examination of a think-aloud experiment with five domain experts.

Learning a Visually Grounded Memory Assistant

Open-Ended Instructable Embodied Agents with Memory-Augmented Large Language Models

Encode-Store-Retrieve: Augmenting Human Memory through Language-Encoded Egocentric Perception

Memoro: Using Large Language Models to Realize a Concise Interface for Real-Time Memory Augmentation

A Novel Neural Multi-Store Memory Network for Autonomous Visual Navigation in Unknown Environment

MEIA: Multimodal Embodied Perception and Interaction in Unknown Environments

HELPER-X: A Unified Instructable Embodied Agent to Tackle Four Interactive Vision-Language Domains with Memory-Augmented Language Models

Multimodal 3D Fusion and In-Situ Learning for Spatially Aware AI

Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments

DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation

User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance

LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory

HuBar: A Visual Analytics Tool to Explore Human Behavior based on fNIRS in AR Guidance Systems

On a Gamified Brain-Computer Interface for Cognitive Training of Spatial Working Memory

KARMA: Augmenting Embodied AI Agents with Long-and-short Term Memory Systems

Memory-Maze: Scenario Driven Benchmark and Visual Language Navigation Model for Guiding Blind People

Semantic HELM: A Human-Readable Memory for Reinforcement Learning

Enhancing Memory Recall Via an Intelligent Social Contact Management System

HuBar: A Visual Analytics Tool to Explore Human Behaviour based on fNIRS in AR guidance systems

3D-Mem: 3D Scene Memory for Embodied Exploration and Reasoning

Memorize What Matters: Emergent Scene Decomposition from Multitraverse