Extraction method of semantic information of video images

Yin Shouyi,Yang Jianxun,Ouyang Peng,Liu Leibo,Wei Shaojun
2017-01-01
Abstract:The invention provides an extraction method of semantic information of video images, and relates to the technical field of video description and annotation. Firstly, frame sequences of a video are extracted according to a certain inter-frame space, a feature vector of each frame image is extracted according to a convolutional neural network, the feature vectors are regarded as input of a LSTM network encoder, output of each time step of the LSTM network encoder and output of the previous time step of a LSTM network decoder are regarded as input of an external storage EMM, and contents of a stored matrix in the external storage EMM are updated; the external storage EMM outputs two reading vectors which are regarded as input vectors of decoding and encoding of the subsequent time step respectively. Through two LSTM network dynamics, reading and writing of the external storage EMM are controlled, and storing of the feature vector of each frame image of the video at the encoding phase is achieved; at the decoding phase, through forecasting of feedback of words, the output of the subsequent time step of the external storage is adjusted, so that when an annotation of the video is generated, the feature vectors of a context are adjusted according to a generated word sequence.
What problem does this paper attempt to address?