Diffusion Autoencoders for Few-shot Image Generation in Hyperbolic Space

Lingxiao Li,Kaixuan Fan,Boqing Gong,Xiangyu Yue
2024-11-27
Abstract:Few-shot image generation aims to generate diverse and high-quality images for an unseen class given only a few examples in that class. However, existing methods often suffer from a trade-off between image quality and diversity while offering limited control over the attributes of newly generated images. In this work, we propose Hyperbolic Diffusion Autoencoders (HypDAE), a novel approach that operates in hyperbolic space to capture hierarchical relationships among images and texts from seen categories. By leveraging pre-trained foundation models, HypDAE generates diverse new images for unseen categories with exceptional quality by varying semantic codes or guided by textual instructions. Most importantly, the hyperbolic representation introduces an additional degree of control over semantic diversity through the adjustment of radii within the hyperbolic disk. Extensive experiments and visualizations demonstrate that HypDAE significantly outperforms prior methods by achieving a superior balance between quality and diversity with limited data and offers a highly controllable and interpretable generation process.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the trade - off between image quality and diversity in **Few - shot Image Generation**, as well as the limited ability to control the properties of newly generated images. Specifically, existing methods face challenges in generating high - quality and diverse images, and it is difficult to effectively generate images of unseen classes with only a small number of examples. ### Problem Background 1. **Challenges in Few - shot Image Generation**: - Existing methods usually have a trade - off between image quality and diversity. - Generating diverse and high - quality images using only a small number of samples for unseen classes is a difficult problem. - Existing methods are difficult to precisely control the specific properties of newly generated images. 2. **Limitations of Existing Methods**: - GAN - based methods face challenges when generating diverse and high - quality images. - Representation methods in Euclidean space are difficult to capture the hierarchical structure between image classes, resulting in limited quality and diversity of generated images. ### Solution To overcome the above problems, the authors propose **Hyperbolic Diffusion Autoencoders (HypDAE)**, a new method that operates in hyperbolic space. The main contributions of HypDAE include: 1. **Introducing Hyperbolic Space**: - Hyperbolic space can naturally represent hierarchical structures and is suitable for capturing the complex semantic relationships between images and text. - By adjusting the radius in the Poincaré disk, the semantic diversity of generated images can be flexibly controlled. 2. **Combining Diffusion Models**: - Using pre - trained diffusion models (such as Stable Diffusion), high - quality and diverse images can be generated with a small amount of data. - Diffusion models provide more abundant details and higher generation quality. 3. **Controllable Image Editing**: - HypDAE supports image editing through text guidance, allowing users to specify the direction and features of generated images. - By adjusting the parameters in hyperbolic space, flexible control of generated images can be achieved. ### Experimental Results Experiments show that HypDAE significantly outperforms existing methods on multiple datasets, can generate diverse images while maintaining high quality, and provides better controllability and interpretability. ### Summary This paper solves the trade - off between image quality and diversity in few - shot image generation by introducing hyperbolic space and diffusion models, and provides a new method to generate high - quality, diverse and controllable images.