Video Key Object Detection Network via Reinforcement Learning
Yue Li,Xiangchun Zhou,Tao Cui,Ruohan Gong,Zuqi Tang,Chuang Wang,Wei Wang
DOI: https://doi.org/10.1109/ISCAS45731.2020.9181243
2020-01-01
Abstract:in video understanding, a core task is to detect the objects in frames and recently state-of-art methods are proposed to exhaustively detect the possible objects. Few studies are discussing exact these objects by the importance of object roles, which refer to ones that are relatively concerned and attract more attention of the audience in a short video clip. This paper intends to represent the audience's attention transfer mechanism by attractive objects (key object) and proposes a key object detection network based on temporal reinforcement learning (RL). Temporal RL means our model structure using LSTM to output RL reward values dynamically. In a sense, it avoids formal reward setting may cause the RL result does not achieve expectation. In order to mark the object that attracts the audience's attention in successive switching video clips, our method keeps an eye of the attractive object by iterating the value of temporal RL strategy. The proposed model can detect multiple key objects simultaneously with the help of a temporal RL strategy that analyses the attention transfer. Specifically, the spatial features and temporal features, extracted by the ensemble convolution networks, are sent into the RL model to represent the objects, corresponding motions and less relative backgrounds. Google AVA [24] dataset, annotate the position and action of the main characters among many existing objects, are selected as the experiment dataset. Compared with other models, the results show that the proposed method can continuously focus on the attractive objects in the sequenced fragments. Different from the general structure of the deep reinforcement model based on DQN network, our temporal RL parameters are mainly generated and calculated in the KOD-attention block. Therefore, in the process of simulating the attention transfer mechanism, our method uses lightweight computation to achieve satisfactory detection precision and speed in the performance of key object detection.