Diffusion-Cap: A Diffusion Model for Image Captioning

Tingxuan Xiao,Yizhe Yang,Yang Gao
DOI: https://doi.org/10.1088/1742-6596/2858/1/012048
2024-01-01
Journal of Physics Conference Series
Abstract:Abstract While autoregressive image caption generation models have achieved remarkable success, they are still constrained in generation speed, which may become a bottleneck in practical applications. To address this challenge, our research introduces the Diffusion Image Captioning Model (Diffusion-Cap). This innovative, non-autoregressive framework conceptualizes image captioning as a process that unifies continuous and discrete diffusion. By exploring two distinct architectural designs, our research investigates optimal strategies for the image captioning diffusion process. Comprehensive assessments on the MSCOCO dataset confirm the superior performance of our Diffusion-Cap, surpassing existing non-autoregressive benchmarks in crafting accurate captions.
What problem does this paper attempt to address?