An Explainable Toolbox for Evaluating Pre-trained Vision-Language Models.

Tiancheng Zhao,Tianqi Zhang,Mingwei Zhu,Haozhan Shen,Kyusong Lee,Xiaopeng Lu,Jianwei Yin
DOI: https://doi.org/10.18653/v1/2022.emnlp-demos.4
2022-01-01
Abstract:We introduce VL-CheckList, a toolbox for evaluating Vision-Language Pretraining (VLP) models, along with a benchmark dataset for fine-grained VLP model analysis.Most existing VLP models evaluate their performance by comparing the finetuned downstream task performance.However, only average downstream task accuracy provides little information about the pros and cons of each VLP method.In this paper, we demonstrate how minor input changes in language and vision will affect the prediction outputs.We also provide a guideline for the research community to utilizes and contributes to this toolbox.Lastly, a case study based on VL-CheckList is conducted to analyze one of the representative VLP models.
What problem does this paper attempt to address?