System and method for video captioning

Jiang Yugang,Chen Shaoxiang
2020-01-01
Abstract:For video captioning, with an encoding module and a decoding module, the encoding module comprises a plurality of encoding units each receiving a set of video frames, wherein the sets of video frames received by two neighboring encoding units are in chronological order; and the encoding units each producing a spatially attended feature, so that the plurality of encoding units produce a spatially attended feature sequence; and the decoding module comprises a decoding unit chronologically receiving a temporally attended feature obtained from the spatially attended features sequence. Also disclosed is a method thereof.
What problem does this paper attempt to address?