A Framework for Knowing Who is Doing What in Aerial Surveillance Videos.

Fan Yang,Sakriani Sakti,Yang Wu,Satoshi Nakamura
DOI: https://doi.org/10.1109/access.2019.2924188
IF: 3.9
2019-01-01
IEEE Access
Abstract:Ultra-high-resolution aerial videos are used to relieve the shortage of surveillance system in sparsely populated regions. For realistic application purpose, it is important to automatically analyze "who is doing what?'' in such videos. Although atomic visual action (AVA) detection has been successfully used to recognize "who is doing what?'' in the movie data, it is challenging to adapt it to ultra-high-resolution aerial videos, where the target persons are relatively tiny and sparsely located. Besides, due to the lack of evaluation metrics, AVA detection has been evaluated by the single-label action; however, using multi-label actions in evaluation are more reasonable since several actions can be simultaneously performed by a person (e.g., making a phone call and walking). To tackle these issues, we propose a novel framework for multi-label AVA detection in ultra-high-resolution aerial videos and introduce novel metrics for multi-label AVA detection evaluation. The experimental results demonstrate that our framework outperforms other methods for interpreting "who is doing what?'' in our target task.
What problem does this paper attempt to address?