Convolutional LSTM Based Video Object Detection.

Xiao Wang,Xiaohua Xie,Jianhuang Lai
DOI: https://doi.org/10.1007/978-3-030-03335-4_9
2018-01-01
Abstract:The state-of-the-art performance for object detection has been significantly improved over the past two years. Despite the effectiveness on still images, something stands in the way of transferring the powerful detection networks to videos object detection. In this work, we present a fast and accurate framework for video object detection that incorporates temporal and contextual information using convolutional LSTM [27]. Moreover, an Encoder-Decoder module is made up based on the convolutional LSTM to predict the feature map. It is an end-to-end learning framework and is general and flexible when combining with still-image detection networks. It achieves significant improvement on both speed and accuracy. Our method significantly improves upon strong single-frame baselines in ImageNet VID [21], especially for more challenging moving objects at high speed.
What problem does this paper attempt to address?