The Image Caption Method Based on CNN-RNN Deep Learning and It's Optimization

Hong-jun CHEN,Fu-qiang LUO,Li-heng ZHAO,Jie ZHANG,Yao LI
DOI: https://doi.org/10.13715/j.cnki.nsjxu.2018.02.016
2018-01-01
Abstract:In order to improve the generalization ability,the coding-decoding model used by Machine Translation was introduced into the image caption.First,the CNN-RNN model was proposed.CNN was responsible for coding,RNN was responsible for decoding,and the RNN model adopted LSTM network.Then,considering that the model ignored the local and semantic information of the image to a certain extent,this paper proposed an improved CNN-MIL-DRN model,which took account of the attribute probability vector,and the complex depth of the nonlinear transformation which is deepened by stacking multiple time states in one time calculation.Finally,we used MS COCO C5 to do model test,taking AP and 5 indexes with different thresholds.Meanwhile,we compared some new models,so as to get the best result of CNN-MIL-DRN model.
What problem does this paper attempt to address?