Which Apple Keeps Which Doctor Away? Colorful Word Representations with Visual Oracles

Zhuosheng Zhang,Haojie Yu,Hai Zhao,Masao Utiyama
DOI: https://doi.org/10.1109/taslp.2021.3130972
2022-01-01
Abstract:Recent pre-trained language models (PrLMs) offer a new performant method of contextualized word representations by leveraging the sequence-level context for modeling. Although the PrLMs generally provide more effective contextualized word representations than non-contextualized models, they are still subject to a sequence of text contexts without diverse hints from multimodality. This paper thus proposes a visual representation method to explicitly enhance conventional word embedding with multiple-aspect senses from visual guidance. In detail, we build a small-scale word-image dictionary from a multimodal seed dataset where each word corresponds to diverse related images. Experiments on 12 natural language understanding and machine translation tasks further verify the effectiveness and the generalization capability of the proposed approach. Analysis shows that our method with visual guidance pays more attention to content words, improves the representation diversity, and is potentially beneficial for enhancing the accuracy of disambiguation.
What problem does this paper attempt to address?