Abstract:Automatic action identification from video and kinematic data is an important machine learning problem with applications ranging from robotics to smart health. Most existing works focus on identifying coarse actions such as running, climbing, or cutting vegetables, which have relatively long durations and a complex series of motions. This is an important limitation for applications that require identification of more elemental motions at high temporal resolution. For example, in the rehabilitation of arm impairment after stroke, quantifying the training dose (number of repetitions) requires differentiating motions with sub-second durations. Our goal is to bridge this gap. To this end, we introduce a large-scale, multimodal dataset, StrokeRehab, as a new action-recognition benchmark that includes elemental short-duration actions labeled at a high temporal resolution. StrokeRehab consists of high-quality inertial measurement unit sensor and video data of 51 stroke-impaired patients and 20 healthy subjects performing activities of daily living like feeding, brushing teeth, etc. Because it contains data from both healthy and impaired individuals, StrokeRehab can be used to study the influence of distribution shift in action-recognition tasks. When evaluated on StrokeRehab, current state-of-the-art models for action segmentation produce noisy predictions, which reduces their accuracy in identifying the corresponding sequence of actions. To address this, we propose a novel approach for high-resolution action identification, inspired by speech-recognition techniques, which is based on a sequence-to-sequence model that directly predicts the sequence of actions. This approach outperforms current state-of-the-art methods on StrokeRehab, as well as on the standard benchmark datasets 50Salads, Breakfast, and Jigsaws.

The Kinetics Human Action Video Dataset

A Short Note on the Kinetics-700 Human Action Dataset

A Short Note on the Kinetics-700-2020 Human Action Dataset

ActionCLIP: Adapting Language-Image Pretrained Models for Video Action Recognition.

VideoBadminton: A Video Dataset for Badminton Action Recognition

Human Action Recognition Using Deep Learning Methods.

The Magni Human Motion Dataset: Accurate, Complex, Multi-Modal, Natural, Semantically-Rich and Contextualized

StrokeRehab: A Benchmark Dataset for Sub-second Action Identification

EV-Action: Electromyography-Vision Multi-Modal Action Dataset

Visual Knowledge Graph for Human Action Reasoning in Videos

CZU-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and 10 wearable inertial sensors

DGU-HAO: A Dataset With Daily Life Objects for Comprehensive 3D Human Action Analysis

Let's Dance: Learning From Online Dance Videos

Target-Specific Action Classification for Automated Assessment of Human Motor Behavior from Video

Sequence-to-Sequence Modeling for Action Identification at High Temporal Resolution

AddBiomechanics Dataset: Capturing the Physics of Human Motion at Scale

HabitAction: A Video Dataset for Human Habitual Behavior Recognition

HMDB: A large video database for human motion recognition

An Open-Access Database of Video Stimuli for Action Observation Research in Neuroimaging Settings: Psychometric Evaluation and Motion Characterization

Egok360: A 360 Egocentric Kinetic Human Activity Video Dataset