Crowded Scene Understanding by Deeply Learned Attributes∗

Jing Shao,Kai Kang,Chen Change Loy,Xiaogang Wang
2015-01-01
Abstract:During the last decade, the field of crowd analysis had a remarkable evolution from crowded scene understanding, including crowd behavior analysis [13, 6, 7, 10, 8, 14, 16], crowd tracking [1, 9, 17], and crowd segmentation [2, 3, 15]. Much of this progress was sparked by the creation of crowd datasets as well as the new and robust features and models for profiling crowd intrinsic properties. Most of the above studies on crowd understanding are scene-specific, that is, the crowd model is learned from a specific scene and thus poor in generalization to describe other scenes. Attributes are particularly effective on characterizing generic properties across scenes. In the recent years, studies in attribute-based representations of objects, faces, actions, and scenes have drawn a large attention as an alternative or complement to categorical representations as they characterize the target subject by several attributes rather than discriminative assignment into a single specific category, which is too restrictive to describe the nature of the target subject. Furthermore, scientific studies have shown that different crowd systems share similar principles that can be characterized by some common properties or attributes. Indeed, attributes can express more information in a crowd video as they can describe a video by answering “Who is in the crowd?”, “Where is the crowd?”, and “Why is crowd here?”, but not merely define a categorical scene label or event label to it. For instance, an attribute-based representation might describe a crowd video as the “conductor” and “choir” perform on the “stage” with “audience” “applauding”, in contrast to a categorical label like “chorus”. Recently, some works [10, 16] have made efforts on crowd attribute profiling. But the number of attributes in their work is limited, as well as the dataset is also small in terms of scene diversity.
What problem does this paper attempt to address?