MS2Mesh-XR: Multi-modal Sketch-to-Mesh Generation in XR Environments

Yuqi Tong,Yue Qiu,Ruiyang Li,Shi Qiu,Pheng-Ann Heng

2024-12-12

Abstract:We present MS2Mesh-XR, a novel multi-modal sketch-to-mesh generation pipeline that enables users to create realistic 3D objects in extended reality (XR) environments using hand-drawn sketches assisted by voice inputs. In specific, users can intuitively sketch objects using natural hand movements in mid-air within a virtual environment. By integrating voice inputs, we devise ControlNet to infer realistic images based on the drawn sketches and interpreted text prompts. Users can then review and select their preferred image, which is subsequently reconstructed into a detailed 3D mesh using the Convolutional Reconstruction Model. In particular, our proposed pipeline can generate a high-quality 3D mesh in less than 20 seconds, allowing for immersive visualization and manipulation in run-time XR scenes. We demonstrate the practicability of our pipeline through two use cases in XR settings. By leveraging natural user inputs and cutting-edge generative AI capabilities, our approach can significantly facilitate XR-based creative production and enhance user experiences. Our code and demo will be available at: <a class="link-external link-https" href="https://yueqiu0911.github.io/MS2Mesh-XR/" rel="external noopener nofollow">this https URL</a>

Computer Vision and Pattern Recognition,Human-Computer Interaction,Multimedia

What problem does this paper attempt to address?

This paper attempts to address the challenges of creating high - quality 3D objects in extended reality (XR) environments. Specifically, existing 3D content generation methods have two main problems: 1. **High user skill requirements**: Most existing methods require users to possess advanced drawing skills, which is a relatively high threshold for ordinary users. 2. **Inaccuracy in interactive drawing**: When performing interactive drawing in XR scenes, due to device and technology limitations, it is difficult to generate high - fidelity 3D models, especially those requiring fine details. To solve these problems, the paper proposes a new multi - modal sketch - to - mesh generation pipeline - MS2Mesh - XR. This pipeline allows users to intuitively create high - quality 3D objects in XR environments through natural hand - drawn sketches and voice input. The following are the main features of this method: - **Multi - modal input**: Combine hand - drawn sketches and voice prompts to accurately capture users' intentions. - **Rapid generation**: The entire generation process is completed within 20 seconds, and the generated 3D model can be immediately imported and used in real - time XR scenes. - **High - quality output**: Utilize ControlNet and the Convolutional Reconstruction Model to generate high - resolution, texture - rich 3D mesh models. Through this method, MS2Mesh - XR not only improves the efficiency and quality of 3D content generation but also significantly enhances the user experience and interactivity, especially in virtual reality (VR) and mixed reality (MR) applications.

MS2Mesh-XR: Multi-modal Sketch-to-Mesh Generation in XR Environments

Rapid 3D Model Generation with Intuitive 3D Input

High-Fidelity 3D Model Generation with Relightable Appearance from Single Freehand Sketches and Text Guidance

Using the CAT for 3D Sketching in Front of Large Displays

SweepCanvas: Sketch-based 3D Prototyping on an RGB-D Image

Reality3DSketch: Rapid 3D Modeling of Objects from Single Freehand Sketches

Magic3DSketch: Create Colorful 3D Models From Sketch-Based 3D Modeling Guided by Text and Language-Image Pre-Training

Deep3DSketch+\+: High-Fidelity 3D Modeling from Single Free-hand Sketches

GesMoSketch: A System for 3D Sketching in AR with One Mobile Device

Deep3DSketch+: Rapid 3D Modeling from Single Free-hand Sketches

Thing2Reality: Transforming 2D Content into Conditioned Multiviews and 3D Gaussian Objects for XR Communication

Interactive Mesh Sculpting with Arbitrary Topologies in Head-Mounted VR Environments

Sketch2NeRF: Multi-view Sketch-guided Text-to-3D Generation

SketchMetaFace: A Learning-based Sketching Interface for High-fidelity 3D Character Face Modeling

SingleSketch2Mesh : Generating 3D Mesh model from Sketch

Deep3DSketch-im: rapid high-fidelity AI 3D model generation by single freehand sketches

AtomXR: Streamlined XR Prototyping with Natural Language and Immersive Physical Interaction

Robust Dual-Modal Speech Keyword Spotting for XR Headsets

Multimodal 3D Fusion and In-Situ Learning for Spatially Aware AI

From sketch to reality: precision-friendly 3D generation technology