Abstract:This paper addresses learning and recognition of human behavior models from multimodal observation in a smart home environment. The proposed approach is part of a framework for acquiring a high-level contextual model for human behavior in an augmented environment. A 3-D video tracking system creates and tracks entities (persons) in the scene. Further, a speech activity detector analyzes audio streams coming from head set microphones and determines for each entity, whether the entity speaks or not. An ambient sound detector detects noises in the environment. An individual role detector derives basic activity like “walking” or “interacting with table” from the extracted entity properties of the 3-D tracker. from the derived multimodal observations, different situations like “aperitif” or “presentation” are learned and detected using statistical models (HMMs). The objective of the proposed general framework is two-fold: the automatic offline analysis of human behavior recordings and the online detection of learned human behavior models. To evaluate the proposed approach, several multimodal recordings showing different situations have been conducted. The obtained results, in particular for offline analysis, are very good, showing that multimodality as well as multiperson observation generation are beneficial for situation recognition. Note to Practitioners—This paper was motivated by the problem of automatically recognizing human behavior and interactions in a smart home environment. The smart home environment is equipped with cameras and microphones that permit the observation of human activity in the scene. The objective is first to visualize the perceived human activities (e.g., for videoconferencing or surveillance of elderly people), and then to provide appropriate services based on these activities. We adopt a layered approach for human activity recognition in the environment. The layered framework is motivated by the human perception of human behavior in the scene (white box). The system first recognizes basic activities of individuals, called roles, like “interacting with table” or “walking.” Then, based on the recognized individual roles, group situations like “aperitif,” “presentation,” or “siesta” are recognized. in this paper, we describe an implementation that is based on a 3-D video tracking system, as well as speech activity detection using head set microphones. We evaluated the system for offline (a posteriori) situation classification and online (in scenario) situation recognition. A prototype system has been realized and installed at France Télécom R&D, visualizing current human behavior in the smart home to a distant user using a web interface. An open issue is still the detection of group dynamics and group formation, which is necessary for group situation recognition in (informal) real settings.

Capture, Recognition, and Visualization of Human Semantic Interactions in Meetings.

An Adaptive Vision System Toward Implicit Human Computer Interaction

3D Intuitive Gesture Interaction via Motion Sensing

Co-Located Human-Human Interaction Analysis using Nonverbal Cues: A Survey

Dynamic context driven human detection and tracking in meeting scenarios.

Engagement Detection in Meetings

Human-to-Human Interaction Detection

Panoptic Studio: A Massively Multiview System for Social Interaction Capture

Natural Interaction Synthesizing in Virtual Teleconferencing

A Bayesian computer vision system for modeling human interactions

An LSTM-Based Approach for Understanding Human Interactions Using Hybrid Feature Descriptors Over Depth Sensors

Multi-Depth-Camera Sensing and Interaction in Smart Space.

Audio-Visual Fused Online Context Analysis Toward Smart Meeting Room

Detecting Human Behavior Models From Multimodal Observation in a Smart Home

Speech Is Not Enough: Interpreting Nonverbal Indicators of Common Knowledge and Engagement

Recognizing Conversational Interaction Based On 3d Human Pose

Human Interaction Understanding With Joint Graph Decomposition and Node Labeling

Social Behavior Analysis in Visual Human Monitoring System : A Survey and Perspective

A Visual Human-Computer Interaction System Based on Hybrid Visual Model

Enhancing Human–Robot Collaboration through a Multi-Module Interaction Framework with Sensor Fusion: Object Recognition, Verbal Communication, User of Interest Detection, Gesture and Gaze Recognition

MeetingVis: Visual Narratives to Assist in Recalling Meeting Context and Content.