Abstract:As demand from the film and gaming industries for 3D scenes with target styles grows, the importance of advanced 3D stylization techniques increases. However, recent methods often struggle to maintain local consistency in color and texture throughout stylized scenes, which is essential for maintaining aesthetic coherence. To solve this problem, this paper introduces ArtNVG, an innovative 3D stylization framework that efficiently generates stylized 3D scenes by leveraging reference style images. Built on 3D Gaussian Splatting (3DGS), ArtNVG achieves rapid optimization and rendering while upholding high reconstruction quality. Our framework realizes high-quality 3D stylization by incorporating two pivotal techniques: Content-Style Separated Control and Attention-based Neighboring-View Alignment. Content-Style Separated Control uses the CSGO model and the Tile ControlNet to decouple the content and style control, reducing risks of information leakage. Concurrently, Attention-based Neighboring-View Alignment ensures consistency of local colors and textures across neighboring views, significantly improving visual quality. Extensive experiments validate that ArtNVG surpasses existing methods, delivering superior results in content preservation, style alignment, and local consistency.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: in the film and game industries, the demand for 3D scenes with specific styles is increasing day by day. However, existing 3D stylization methods have difficulties in maintaining local color and texture consistency. To solve this problem, this paper proposes ArtNVG, an innovative 3D stylization framework, which aims to efficiently generate stylized 3D scenes by referring to style images. Specifically, existing methods often process each view independently when dealing with multi - view images, resulting in a lack of consistency between adjacent views and information leakage problems. ArtNVG solves these problems by introducing Content - Style Separated Control and Attention - based Neighboring - View Alignment mechanisms, thereby improving visual quality and reducing the risk of information leakage. ### Key Technical Points 1. **Content - Style Separated Control**: - Use the CSGO model and Tile ControlNet to separate content and style features and reduce the risk of information leakage. - Pass content and style control signals through the cross - attention layer to ensure that no unnecessary information is mixed in during the style transfer process. 2. **Attention - based Neighboring - View Alignment**: - Modify the self - attention layer to a neighboring - view attention layer so that local information between adjacent views can be shared. - Ensure that the generated stylized images maintain consistent colors and textures between different views, significantly enhancing the visual effect. ### Method Overview The workflow of ArtNVG is as follows: 1. **Render Content Image**: Render the content image from the original 3D Gaussian Splatting (3DGS) scene. 2. **Encode Content and Style**: Use the projection module of CSGO and Tile ControlNet to encode the content image and style image respectively to obtain content and style control signals. 3. **Neighboring - View Clustering**: Cluster the neighboring views of the content image to ensure local consistency. 4. **Denoising Process**: Mainly control the content in the down - sampling block of UNet and control the style in the up - sampling block. 5. **Fine - tune 3D Scene**: Use the generated stylized image to fine - tune the original 3D scene and apply the NNFM loss function to enhance style alignment and detail representation. ### Experimental Results Through experiments on the Tanks and Temples dataset, ArtNVG has demonstrated its superior performance in content fidelity, style alignment, and visual quality. Both quantitative and qualitative results show that ArtNVG has higher content fidelity (CFSD), better style similarity (CSD), and stronger multi - view consistency (CLIP - DC) compared to existing methods. ### Summary ArtNVG effectively solves the local consistency problem in 3D stylization by introducing the Content - Style Separated Control and Attention - based Neighboring - View Alignment mechanisms, improving the quality and visual effect of stylized 3D scenes.

ArtNVG: Content-Style Separated Artistic Neighboring-View Gaussian Stylization

StylizedGS: Controllable Stylization for 3D Gaussian Splatting

Reference-based Controllable Scene Stylization with Gaussian Splatting

Gaussian Splatting in Style

G-Style: Stylized Gaussian Splatting

StyleGaussian: Instant 3D Style Transfer with Gaussian Splatting

Style3D: Attention-guided Multi-view Style Transfer for 3D Object Generation

WaSt-3D: Wasserstein-2 Distance for Scene-to-Scene Stylization on 3D Gaussians

PNeSM: Arbitrary 3D Scene Stylization via Prompt-Based Neural Style Mapping

StyleSplat: 3D Object Style Transfer with Gaussian Splatting

AI-Driven Stylization of 3D Environments

ART3D: 3D Gaussian Splatting for Text-Guided Artistic Scenes Generation

StyleCity: Large-Scale 3D Urban Scenes Stylization

Advances in 3D Neural Stylization: A Survey

3DStyleGLIP: Part-Tailored Text-Guided 3D Neural Stylization

GaussianStyle: Gaussian Head Avatar via StyleGAN

Stylizing Sparse-View 3D Scenes with Hierarchical Neural Representation

Fast 3D Stylized Gaussian Portrait Generation From a Single Image With Style Aligned Sampling Loss

InstantStyleGaussian: Efficient Art Style Transfer with 3D Gaussian Splatting

4DStyleGaussian: Zero-shot 4D Style Transfer with Gaussian Splatting

HyperStyle3D: Text-Guided 3D Portrait Stylization via Hypernetworks