Temporal Label Aggregation for Unintentional Action Localization

Nuoxing Zhou,Guangyi Chen,Jinglin Xu,Wei-Shi Zheng,Jiwen Lu
DOI: https://doi.org/10.1109/icme51207.2021.9428125
2021-01-01
Abstract:Humans can easily understand whether a person’s action is intentional or not. However, it is very challenging to teach a machine to recognize this due to the lack of referable comparisons and reliable annotations. Given a video with unintentional action, the annotations are usually unreliable due to the intrinsic ambiguity from multiple annotators and the subjective appraisals. To address this problem, we propose a new framework which online aggregates multiple probabilistic labels for unintentional action localization. Specifically, we first model the uncertainty of annotations with a temporal probability distribution, and then develop a label attention model to aggregate the reliable annotations in an online manner. We evaluate our method on the public OOPS dataset where each video contains multiple annotations of unintentional action and our experimental results show that mining reliable supervision information from multiple unreliable annotations achieves significant improvements over the baseline methods.
What problem does this paper attempt to address?