An Analysis of Action Recognition Datasets for Language and Vision Tasks

Spandana Gella,Frank Keller
DOI: https://doi.org/10.48550/arXiv.1704.07129
2017-04-24
Computation and Language
Abstract:A large amount of recent research has focused on tasks that combine language and vision, resulting in a proliferation of datasets and methods. One such task is action recognition, whose applications include image annotation, scene under- standing and image retrieval. In this survey, we categorize the existing ap- proaches based on how they conceptualize this problem and provide a detailed review of existing datasets, highlighting their di- versity as well as advantages and disad- vantages. We focus on recently devel- oped datasets which link visual informa- tion with linguistic resources and provide a fine-grained syntactic and semantic anal- ysis of actions in images.
What problem does this paper attempt to address?