MMC: Multi-modal colorization of images using textual description
Subhankar Ghosh,Saumik Bhattacharya,Prasun Roy,Umapada Pal,Michael Blumenstein
DOI: https://doi.org/10.1007/s11760-024-03650-y
IF: 1.583
2024-12-11
Signal Image and Video Processing
Abstract:Handling various objects with different colours is a significant challenge for image colourisation techniques. Thus, for complex real-world scenes, the existing image colourisation algorithms often fail to maintain colour consistency. In this work, we attempt to integrate textual descriptions as an auxiliary condition, along with the greyscale image that is to be colourised, to improve the fidelity of the colourisation process. To do so, we have proposed a deep network that takes two inputs (greyscale image and the respective encoded text description) and tries to predict the relevant colour components. Also, we have predicted each object in the image and have colourised them with their individual description to incorporate their specific attributes in the colourisation process. After that, a fusion model fuses all the image objects (segments) to generate the final colourised image. As the respective textual descriptions contain colour information of the objects in the image, text encoding helps improve the overall quality of predicted colours. In terms of performance, the proposed method outperforms existing colourisation techniques in terms of LPIPS, PSNR and SSIM metrics.
engineering, electrical & electronic,imaging science & photographic technology