Learning Visual and Textual Representations for Multimodal Matching and Classification

Yu Liu,Li Liu,Yanming Guo,Michael S. Lew
DOI: https://doi.org/10.1016/j.patcog.2018.07.001
IF: 8
2018-01-01
Pattern Recognition
Abstract:•A unified network for image-text matching and classification.•Seamlessly incorporating the matching and classification components.•A multi-stage training algorithm by combining the matching and classification loss.•Comprehensive study on the effectiveness of the proposed approach.•Comparisons on four well-known multimodal benchmarks.
What problem does this paper attempt to address?