FissionVAE: Federated Non-IID Image Generation with Latent Space and Decoder Decomposition

Chen Hu,Jingjing Deng,Xianghua Xie,Xiaoke Ma
2024-08-30
Abstract:Federated learning is a machine learning paradigm that enables decentralized clients to collaboratively learn a shared model while keeping all the training data local. While considerable research has focused on federated image generation, particularly Generative Adversarial Networks, Variational Autoencoders have received less attention. In this paper, we address the challenges of non-IID (independently and identically distributed) data environments featuring multiple groups of images of different types. Specifically, heterogeneous data distributions can lead to difficulties in maintaining a consistent latent space and can also result in local generators with disparate texture features being blended during aggregation. We introduce a novel approach, FissionVAE, which decomposes the latent space and constructs decoder branches tailored to individual client groups. This method allows for customized learning that aligns with the unique data distributions of each group. Additionally, we investigate the incorporation of hierarchical VAE architectures and demonstrate the use of heterogeneous decoder architectures within our model. We also explore strategies for setting the latent prior distributions to enhance the decomposition process. To evaluate our approach, we assemble two composite datasets: the first combines MNIST and FashionMNIST; the second comprises RGB datasets of cartoon and human faces, wild animals, marine vessels, and remote sensing images of Earth. Our experiments demonstrate that FissionVAE greatly improves generation quality on these datasets compared to baseline federated VAE models.
Machine Learning,Artificial Intelligence,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### The Problem the Paper Attempts to Solve This paper aims to address the challenge of generating high-quality images under non-independent and identically distributed (non-IID) data conditions in a Federated Learning (FL) environment. Specifically, the paper focuses on the issues encountered by Variational Autoencoders (VAEs) when dealing with multiple different types of image datasets. ### Background and Challenges 1. **Inconsistent Data Distribution**: In federated learning, the data distribution of each client may differ, leading to inconsistencies in the models of each client during training. This inconsistency, particularly for generative models like VAEs, can cause the shared latent space to be difficult to maintain consistently, thereby affecting the quality of the generated images. 2. **Generator Fusion Problem**: When aggregating the generators of each client, the significant differences in data characteristics across clients can result in generated images with mixed features, meaning the generated images contain features from different datasets. This severely impacts the authenticity and quality of the generated images. 3. **Limitations of Existing Methods**: Existing research mainly focuses on Generative Adversarial Networks (GANs), with less attention given to VAEs. Although some methods attempt to mitigate these issues by exchanging local discriminators or grouping and aggregating generators, these methods pose risks to client privacy and still fall short in generating high-quality images. ### Solution To address the above issues, the paper proposes a new model called FissionVAE. The main innovations of FissionVAE include: 1. **Latent Space Decomposition**: FissionVAE decomposes the latent space according to different data groups, with each data group corresponding to a unique prior distribution. This ensures that each client's data is mapped to its corresponding latent distribution, avoiding the mixing of latent spaces between different data groups. 2. **Customized Decoder Branches**: FissionVAE designs specialized decoder branches for each client group, allowing these branches to learn in a customized manner based on the characteristics of their respective data groups. This better preserves the unique visual features of different image types. 3. **Hierarchical Inference Architecture**: FissionVAE introduces a hierarchical inference architecture, allowing the use of deeper network structures to capture more complex data distributions. This architecture not only improves the quality of the generated images but also increases the model's flexibility, enabling clients with different computational resources to use different decoder architectures. ### Experimental Validation To validate the effectiveness of FissionVAE, the paper constructs two composite datasets: 1. **Mixed MNIST**: Combines the MNIST and FashionMNIST datasets, containing handwritten digits and clothing images, respectively. 2. **CHARM**: Includes five different datasets, namely anime faces, real human faces, animals, remote sensing images, and marine vessel images. Experimental results show that FissionVAE significantly outperforms baseline federated VAE models in terms of generation quality on these datasets, particularly in reducing the mixed features of generated images. ### Conclusion By proposing the FissionVAE model, the paper effectively addresses the challenge of generating high-quality images under non-IID data conditions in federated learning. The model significantly improves the quality and diversity of generated images through latent space decomposition, customized decoder branches, and a hierarchical inference architecture.