SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation

Yuxuan Zhang,Yiren Song,Jiaming Liu,Rui Wang,Jinpeng Yu,Hao Tang,Huaxia Li,Xu Tang,Yao Hu,Han Pan,Zhongliang Jing
DOI: https://doi.org/10.1109/cvpr52733.2024.00771
2024-01-01
Computer Vision and Pattern Recognition
Abstract:Recent advancements in subject-driven image generation have led to zero-shotgeneration, yet precise selection and focus on crucial subject representationsremain challenging. Addressing this, we introduce the SSR-Encoder, a novelarchitecture designed for selectively capturing any subject from single ormultiple reference images. It responds to various query modalities includingtext and masks, without necessitating test-time fine-tuning. The SSR-Encodercombines a Token-to-Patch Aligner that aligns query inputs with image patchesand a Detail-Preserving Subject Encoder for extracting and preserving finefeatures of the subjects, thereby generating subject embeddings. Theseembeddings, used in conjunction with original text embeddings, condition thegeneration process. Characterized by its model generalizability and efficiency,the SSR-Encoder adapts to a range of custom models and control modules.Enhanced by the Embedding Consistency Regularization Loss for improvedtraining, our extensive experiments demonstrate its effectiveness in versatileand high-quality image generation, indicating its broad applicability. Projectpage: https://ssr-encoder.github.io
What problem does this paper attempt to address?