Disentangling the Spatial Structure and Style in Conditional VAE.

Ziye Zhang,Li Sun,Zhilin Zheng,Qingli Li
DOI: https://doi.org/10.1109/icip40778.2020.9190908
2019-01-01
Abstract:This paper proposes a structure in conditional variation autoencoder (cVAE) to disentangle the latent vector into a spatial structure and a style code, complementary to each other, with the one $( z_{s})$ being label relevant and the other $( z_{u})$ irrelevant. Different from traditional cVAE, our network maps the condition label into its relevant code z s through a separated module. Depending on whether the label directly relates to the image spatial structure or not, z s output from the condition mapping module is used either as the style code with the two spatial dimension of $1 \times 1$, or as the spatial structure code with a single channel. Based on the input image and its corresponding z s , the encoder provides the posterior distribution close to a common prior regardless of its label, thus z u sampled from it becomes label irrelevant. The decoder employs z s and z u by two typical adaptive normalization modules to reconstruct the input image. Results on two datasets with different types of labels show the effectiveness of our method.
What problem does this paper attempt to address?