Abstract:Fall is the biggest threat to seniors, with significant emotional, physical and financial implications. It is the major cause of serious injuries, disabilities, hospitalizations and even death especially for elderly people living alone. Timely detection could provide immediate medical service to the injured and avoid its harmful consequences. Great number of vision-based techniques has been proposed by installing cameras in several everyday environments. Recently, deep learning has revolutionized these techniques, mostly using convolutional neural networks (CNNs). In this paper, we propose weighted multi-stream deep convolutional neural networks that exploit the rich multimodal data provided by RGB-D cameras. Our method detects automatically fall events and sends a help request to the caregivers. Our contribution is three-fold. We build a new architecture composed of four separate CNN streams, one for each modality. The first modality is based on a single combined RGB and depth image to encode static appearance information. RGB image is used to capture color and texture and depth image deals with illumination variations. In contrast of the first feature that lacks the contextual information about previous and next frames, the second modality characterizes the human shape variations. After background subtraction and person recognition, human silhouette is extracted and stacked to define history of binary motion HBMI. The last two modalities are used to more discriminate the motion information. Stacked amplitude and oriented flow are used in addition to stacked optical flow field to describe respectively the velocity, the direction and the motion displacements. The main motivation behind the use of these multimodal data is to combine complementary information such as motion, shape, RGB and depth appearance to achieve more accurate detection than using only one modality. Our second contribution is the combination of the four streams to generate the final decision for fall detection. We evaluate early and late fusion strategies and we have defined the weight of each modality based on its overall system performance. Weighted score fusion is finally adopted based on our experiments. In the third contribution, transfer learning and data augmentation are applied to increase the amount of training data, avoid over fitting and improve the accuracy. Experiments have been conducted on publicly available standard datasets and demonstrate the effectiveness of the proposed method compared to existing methods.

Fall Detection in Multi-Camera Surveillance Videos

Deep Learning Based Abnormal Behavior Detection for Elderly Healthcare Using Consumer Network Cameras

Collaborative Fall Detection Using a Wearable Device and a Companion Robot.

Fall detection with a non-intrusive and first-person vision approach

Multi-camera, multi-person, and real-time fall detection using long short term memory

Video-based Fall Detection for Seniors with Human Pose Estimation

A Real-time Fall Detection System Using ToF Depth Images.

A Novel Multi-Cue Integration System for Efficient Human Fall Detection

Multi Visual Modality Fall Detection Dataset

Elderly fall detection based on multi-stream deep convolutional networks

A human fall detection framework based on multi-camera fusion

Multimodal fall detection for solitary individuals based on audio-video decision fusion processing

Fall Detection in Elderly Care System Based on Group of Pictures

Fall detection using multimodal data

Semantic segmentation-based system for fall detection and post-fall posture classification

Fall Detection Based on Body Part Tracking Using a Depth Camera

An Edge-device Based Fast Fall Detection Using Spatio-temporal Optical Flow Model

Fall Detection Method for Infrared Videos Based on Spatial-Temporal Graph Convolutional Network

Fall detection system with portable camera

Elderly Fall Detection Based on GCN-LSTM Multi-Task Learning Using Nursing Aids Integrated with Multi-Array Flexible Tactile Sensors

Advancing Fall Detection Utilizing Skeletal Joint Image Representation and Deformable Layers