PostDoc: Generating Poster from a Long Multimodal Document Using Deep Submodular Optimization

Vijay Jaisankar,Sambaran Bandyopadhyay,Kalp Vyas,Varre Chaitanya,Shwetha Somasundaram
2024-05-31
Abstract:A poster from a long input document can be considered as a one-page easy-to-read multimodal (text and images) summary presented on a nice template with good design elements. Automatic transformation of a long document into a poster is a very less studied but challenging task. It involves content summarization of the input document followed by template generation and harmonization. In this work, we propose a novel deep submodular function which can be trained on ground truth summaries to extract multimodal content from the document and explicitly ensures good coverage, diversity and alignment of text and images. Then, we use an LLM based paraphraser and propose to generate a template with various design aspects conditioned on the input content. We show the merits of our approach through extensive automated and human evaluations.
Artificial Intelligence,Computation and Language,Machine Learning
What problem does this paper attempt to address?
This paper mainly addresses the problem of how to automatically generate visually rich posters for long multimodal documents (including text and images). Currently, transforming long documents into visually appealing posters is a less researched but challenging task, involving content summarization, template generation, and harmonious layout of design elements. The paper proposes an end-to-end process called PostDoc, which utilizes deep submodular functions to select multimodal content from the document, ensuring coverage, diversity, and alignment of text and images. Then, it undergoes rewriting using a large-scale language model and generates a template based on the input content. The effectiveness of the method is demonstrated through both automated and manual evaluations. This method overcomes the limitations of existing models in handling multimodal outputs, handling long text, and potential hallucination issues.