Abstract:Score distillation sampling (SDS) has emerged as an effective framework in text-driven 3D editing tasks, leveraging diffusion models for 3D consistent editing. However, existing SDS-based 3D editing methods suffer from long training times and produce low-quality results. We identify that the root cause of this performance degradation is their conflict with the sampling dynamics of diffusion models. Addressing this conflict allows us to treat SDS as a diffusion reverse process for 3D editing via sampling from data space. In contrast, existing methods naively distill the score function using diffusion models. From these insights, we propose DreamCatalyst, a novel framework that considers these sampling dynamics in the SDS framework. Specifically, we devise the optimization process of our DreamCatalyst to approximate the diffusion reverse process in editing tasks, thereby aligning with diffusion sampling dynamics. As a result, DreamCatalyst successfully reduces training time and improves editing quality. Our method offers two modes: (1) a fast mode that edits Neural Radiance Fields (NeRF) scenes approximately 23 times faster than current state-of-the-art NeRF editing methods, and (2) a high-quality mode that produces superior results about 8 times faster than these methods. Notably, our high-quality mode outperforms current state-of-the-art NeRF editing methods in terms of both speed and quality. DreamCatalyst also surpasses the state-of-the-art 3D Gaussian Splatting (3DGS) editing methods, establishing itself as an effective and model-agnostic 3D editing solution. See more extensive results on our project page: <a class="link-external link-https" href="https://dream-catalyst.github.io" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The paper attempts to address the issues of long training times and low editing quality in text-driven 3D editing tasks with existing methods. Specifically, current 3D editing methods based on Score Distillation Sampling (SDS) suffer from performance degradation when handling 3D scene editing due to conflicts with the sampling dynamics of diffusion models, making it difficult to effectively balance editing capability and scene identity preservation. These issues limit the efficiency and effectiveness of 3D editing. To overcome these challenges, the paper proposes a new framework—DreamCatalyst, which addresses the aforementioned issues through the following improvements: 1. **Rebalancing Identity Preservation and Editing Capability**: The paper introduces a new objective function that rebalances the weights of identity preservation and editing capability at different levels of noise perturbation. This method ensures that identity preservation is emphasized at high noise perturbation levels, while its weight is reduced at low noise perturbation levels, thereby synthesizing details more effectively. 2. **Improved Model Architecture**: The paper introduces the FreeU technique, which enhances editing capability without affecting identity preservation by suppressing high-frequency features and amplifying low-frequency features. This not only improves editing quality but also avoids additional computational and memory overhead. 3. **Fast and Efficient 3D Editing**: DreamCatalyst offers two modes: a fast mode and a high-quality mode. The fast mode is approximately 23 times faster than existing methods, while the high-quality mode is about 8 times faster than existing methods, with significant improvements in editing quality. In summary, by redesigning the objective function and model architecture, the paper successfully addresses the issues of long training times and low editing quality in existing 3D editing methods, achieving efficient and high-quality 3D scene editing.

DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation

StableDreamer: Taming Noisy Score Distillation Sampling for Text-to-3D

ConsistDreamer: 3D-Consistent 2D Diffusion for High-Fidelity Scene Editing

BoostDream: Efficient Refining for High-Quality Text-to-3D Generation from Multi-View Diffusion

FocalDreamer: Text-driven 3D Editing via Focal-fusion Assembly

CTRL-D: Controllable Dynamic 3D Scene Editing with Personalized 2D Diffusion

Enhanced 3D Generation by 2D Editing

DreamSampler: Unifying Diffusion Sampling and Score Distillation for Image Manipulation

MicroDreamer: Efficient 3D Generation in $\sim$20 Seconds by Score-based Iterative Reconstruction

EucliDreamer: Fast and High-Quality Texturing for 3D Models with Depth-Conditioned Stable Diffusion

ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation

MicroDreamer: Efficient 3D Generation in ∼20 Seconds by Score-based Iterative Reconstruction

EfficientDreamer: High-Fidelity and Robust 3D Creation via Orthogonal-view Diffusion Prior

ED-NeRF: Efficient Text-Guided Editing of 3D Scene with Latent Space NeRF

EditSplat: Multi-View Fusion and Attention-Guided Optimization for View-Consistent 3D Scene Editing with 3D Gaussian Splatting

Efficient-NeRF2NeRF: Streamlining Text-Driven 3D Editing with Multiview Correspondence-Enhanced Diffusion Models

DreamTime: An Improved Optimization Strategy for Diffusion-Guided 3D Generation

VividDreamer: Towards High-Fidelity and Efficient Text-to-3D Generation

DreamMotion: Space-Time Self-Similar Score Distillation for Zero-Shot Video Editing

4D-Editor: Interactive Object-level Editing in Dynamic Neural Radiance Fields via Semantic Distillation