Abstract:We describe a method for articulated human detection and human pose estimation in static images based on a new representation of deformable part models. Rather than modeling articulation using a family of warped (rotated and foreshortened) templates, we use a mixture of small, nonoriented parts. We describe a general, flexible mixture model that jointly captures spatial relations between part locations and co-occurrence relations between part mixtures, augmenting standard pictorial structure models that encode just spatial relations. Our models have several notable properties: 1) They efficiently model articulation by sharing computation across similar warps, 2) they efficiently model an exponentially large set of global mixtures through composition of local mixtures, and 3) they capture the dependency of global geometry on local appearance (parts look different at different locations). When relations are tree structured, our models can be efficiently optimized with dynamic programming. We learn all parameters, including local appearances, spatial relations, and co-occurrence relations (which encode local rigidity) with a structured SVM solver. Because our model is efficient enough to be used as a detector that searches over scales and image locations, we introduce novel criteria for evaluating pose estimation and human detection, both separately and jointly. We show that currently used evaluation criteria may conflate these two issues. Most previous approaches model limbs with rigid and articulated templates that are trained independently of each other, while we present an extensive diagnostic evaluation that suggests that flexible structure and joint training are crucial for strong performance. We present experimental results on standard benchmarks that suggest our approach is the state-of-the-art system for pose estimation, improving past work on the challenging Parse and Buffy datasets while being orders of magnitude faster.

Combination Features and Models for Human Detection.

Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics.

A Two-Stage Human Body Detector on Depth Data

Human Detection Method Based on Multi-Part Detector and Multi-Instance Learning

An HOG-CT Human Detector with Histogram-Based Search.

A novel hybrid human detection system

MPLBoost-based Mixture Model for Effective Human Detection with Deformable Part Model

Human Detection Based on Fusion of Histograms of Oriented Gradients and Main Partial Features

Human Articulated Body Recognition Method in High-Resolution Monitoring Images

Locality guided cross-modal feature aggregation and pixel-level fusion for multispectral pedestrian detection

Human Detection Aided by Deeply Learned Semantic Masks

GigaHumanDet: Exploring Full-Body Detection on Gigapixel-Level Images

GLCM-Based Feature Combination for Extraction Model Optimization in Object Detection Using Machine Learning

Real-time human detection based on gentle MILBoost with variable granularity HOG-CSLBP

Boosted parametric model for human detection

Weighted Deformable Part Model for Robust Human Detection

Fast Human Detection Using Node-Combined Part Detector

Human Tracking Algorithm Based on Model Fusion

Detector-in-Detector: Multi-Level Analysis for Human-Parts

Object Detection via Aspect Ratio and Context Aware Region-based Convolutional Networks

Articulated Human Detection with Flexible Mixtures of Parts