Abstract:in video understanding, a core task is to detect the objects in frames and recently state-of-art methods are proposed to exhaustively detect the possible objects. Few studies are discussing exact these objects by the importance of object roles, which refer to ones that are relatively concerned and attract more attention of the audience in a short video clip. This paper intends to represent the audience's attention transfer mechanism by attractive objects (key object) and proposes a key object detection network based on temporal reinforcement learning (RL). Temporal RL means our model structure using LSTM to output RL reward values dynamically. In a sense, it avoids formal reward setting may cause the RL result does not achieve expectation. In order to mark the object that attracts the audience's attention in successive switching video clips, our method keeps an eye of the attractive object by iterating the value of temporal RL strategy. The proposed model can detect multiple key objects simultaneously with the help of a temporal RL strategy that analyses the attention transfer. Specifically, the spatial features and temporal features, extracted by the ensemble convolution networks, are sent into the RL model to represent the objects, corresponding motions and less relative backgrounds. Google AVA [24] dataset, annotate the position and action of the main characters among many existing objects, are selected as the experiment dataset. Compared with other models, the results show that the proposed method can continuously focus on the attractive objects in the sequenced fragments. Different from the general structure of the deep reinforcement model based on DQN network, our temporal RL parameters are mainly generated and calculated in the KOD-attention block. Therefore, in the process of simulating the attention transfer mechanism, our method uses lightweight computation to achieve satisfactory detection precision and speed in the performance of key object detection.

Deep Reinforcement Learning for Automatic Thumbnail Generation.

Emotion Attention-Aware Collaborative Deep Reinforcement Learning for Image Cropping

CropNet: Real-Time Thumbnailing.

Automatic Thumbnail Generation Based on Visual Representativeness and Foreground Recognizability

Personalized Automatic Image Annotation Based on Reinforcement Learning

Deep reinforcement learning enables adaptive-image augmentation for automated optical inspection of plant rust

Image-Based Deep Reinforcement Learning with Intrinsically Motivated Stimuli: On the Execution of Complex Robotic Tasks

Aesthetic Photo Collage with Deep Reinforcement Learning

A2-RL: Aesthetics Aware Reinforcement Learning for Image Cropping

Video Key Object Detection Network via Reinforcement Learning

PR-RL: Portrait Relighting Via Deep Reinforcement Learning

RL for Consistency Models: Faster Reward Guided Text-to-Image Generation

RLSS: A Deep Reinforcement Learning Algorithm for Sequential Scene Generation

Better Deep Visual Attention with Reinforcement Learning in Action Recognition.

Action Parsing-Driven Video Summarization Based on Reinforcement Learning

Attention-Aware Deep Reinforcement Learning for Video Face Recognition

Object-sensitive Deep Reinforcement Learning

Deep Reinforcement Learning for Image Hashing

Reward Fine-Tuning Two-Step Diffusion Models via Learning Differentiable Latent-Space Surrogate Reward

Image Quality Assessment in Visual Reinforcement Learning for Fast-moving Targets

Video Summarisation by Classification with Deep Reinforcement Learning