Abstract:Panoramic Image Generation has emerged as an important task in image generation, driven by growing demands for large-scale visuals in creative and technical applications. While diffusion models have dominated this field, they face inherent limitations, including the multilevel-coherence challenge and implementation complexity, leading to suboptimal outcomes. In this paper, we introduce PanoLlama, a novel framework that redefines panoramic image generation as a next-token prediction task. Building on the pre-trained LlamaGen architecture, we generate images in an autoregressive manner and develop an expansion strategy to handle size limitations. This method aligns with the image token structure in a crop-wise and training-free manner, resulting in high-quality panoramas with minimal seams and maximum scalability. PanoLlama demonstrates its effectiveness and versatility in our experiments, achieving the best overall performance while offering flexibility for multi-scale, multi-layout, and multi-guidance generation. It overcomes the challenges that diffusion-based methods fail to address, setting a new paradigm for panoramic image generation tasks. Code is available at <a class="link-external link-https" href="https://github.com/0606zt/PanoLlama" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

This paper attempts to solve two main problems in Panoramic Image Generation (PIG): 1. **Multilevel - Coherence Challenge**: - The goal is to achieve coherence between low - level features (such as color, texture, edge) and high - level features (such as layout, structure, semantic) in panoramic image generation. Diffusion models have difficulties in definition and balance when dealing with this multilevel coherence. 2. **Implementation Complexity**: - Diffusion models require complex algorithm designs to coordinate the denoising paths between different image patches, which affects the stability and scalability of the system. To solve these problems, the paper proposes the PanoLlama framework, which redefines panoramic image generation as a next - token prediction task. Specifically: - **New Paradigm**: By leveraging pre - trained Auto - Regressive (AR) models, PanoLlama generates high - quality panoramas in a training - free manner, avoiding the multilevel - coherence and implementation - complexity problems existing in traditional diffusion models. - **Speed Up**: PanoLlama does not need to perform time - consuming denoising iterations and optimization processes, thus significantly increasing the generation speed. - **Versatile Applications**: Besides text - to - panorama generation, PanoLlama also supports multi - scale, multi - layout, and multi - guided generation, with higher flexibility and adaptability. In summary, by introducing Auto - Regressive models and sequence - generation methods, PanoLlama fundamentally solves the limitations of existing diffusion models in panoramic image generation and provides a more efficient and flexible solution.

PanoLlama: Generating Endless and Coherent Panoramas with Next-Token-Prediction LLMs

Taming Vector-Wise Quantization for Wide-Range Image Blending with Smooth Transition

Aerial-PASS: Panoramic Annular Scene Segmentation in Drone Videos

PanoFree: Tuning-Free Holistic Multi-view Image Generation with Cross-view Self-Guidance

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

L-MAGIC: Language Model Assisted Generation of Images with Coherence

DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization

DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion

VidPanos: Generative Panoramic Videos from Casual Panning Videos

Cross-View Panorama Image Synthesis.

Multi-Viewpoint Panorama Construction with Wide-Baseline Images

HORIZON: A High-Resolution Panorama Synthesis Framework

Panoramic Image Generation: From 2-D Sketch to Spherical Image

A Feature-Based Approach to Panorama Generation

TwinDiffusion: Enhancing Coherence and Efficiency in Panoramic Image Generation with Diffusion Models

Image Synthesis from Layout with Locality-Aware Mask Adaption

Local-to-Global Panorama Inpainting for Locale-Aware Indoor Lighting Prediction

Customizing 360-Degree Panoramas through Text-to-Image Diffusion Models

Taming Stable Diffusion for Text to 360° Panorama Image Generation