SpaceBlender: Creating Context-Rich Collaborative Spaces Through Generative 3D Scene Blending

Nels Numan,Shwetha Rajaram,Balasaravanan Thoravi Kumaravel,Nicolai Marquardt,Andrew D. Wilson
DOI: https://doi.org/10.1145/3654777.3676361
2024-09-21
Abstract:There is increased interest in using generative AI to create 3D spaces for Virtual Reality (VR) applications. However, today's models produce artificial environments, falling short of supporting collaborative tasks that benefit from incorporating the user's physical context. To generate environments that support VR telepresence, we introduce SpaceBlender, a novel pipeline that utilizes generative AI techniques to blend users' physical surroundings into unified virtual spaces. This pipeline transforms user-provided 2D images into context-rich 3D environments through an iterative process consisting of depth estimation, mesh alignment, and diffusion-based space completion guided by geometric priors and adaptive text prompts. In a preliminary within-subjects study, where 20 participants performed a collaborative VR affinity diagramming task in pairs, we compared SpaceBlender with a generic virtual environment and a state-of-the-art scene generation framework, evaluating its ability to create virtual spaces suitable for collaboration. Participants appreciated the enhanced familiarity and context provided by SpaceBlender but also noted complexities in the generative environments that could detract from task focus. Drawing on participant feedback, we propose directions for improving the pipeline and discuss the value and design of blended spaces for different scenarios.
Artificial Intelligence,Human-Computer Interaction
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of how to use generative artificial intelligence (Generative AI) to create 3D environments suitable for virtual reality (VR) telepresence. Specifically, existing generative models usually produce fully synthetic environments that cannot fully support collaborative tasks that need to integrate into the user's physical background. In addition, the 3D meshes generated by these models have some core usability issues when used as VR environments, such as non - navigable paths, distracting visual and geometric artifacts, and disturbing spaces. To solve these problems, the paper introduces **SpaceBlender**, an innovative pipeline that uses advanced generative AI technology to integrate the user's physical environment into a unified virtual space to support VR telepresence applications. The main objectives are: 1. **Integrate the physical environments of multiple users**: Create a coherent and rich 3D environment by processing and aligning multiple images from different viewpoints and positions. 2. **Improve the coherence and realism of scene integration**: Ensure that the generated virtual space has natural transitions and a consistent geometric structure, reducing inconsistent boundaries and artifacts. 3. **Automate the generation process**: Allow users to create a hybrid environment without a large amount of manual configuration, thereby simplifying the virtual environment design in VR telepresence applications. 4. **Meet the core usability requirements of VR**: Ensure that the generated environment is easy to navigate and view, enhancing the user experience. ### Overview of research methods To achieve the above - mentioned goals, SpaceBlender adopts the following key technical steps: - **Depth estimation and 3D mesh generation**: Estimate the depth values from 2D images and back - project them into 3D space to generate an initial 3D sub - mesh. - **Floor alignment**: Use the RANSAC algorithm to identify and align the floor planes of each sub - mesh, ensuring that all sub - meshes are on the same horizontal plane. - **Layout generation**: Arrange the positions of sub - meshes according to the parameterized layout technique to form an open - space structure. - **Creation of geometric prior mesh**: Generate a convex hull based on the sub - mesh layout to define the shape of the hybrid space. - **Adaptive text - prompt inference**: Combine a visual language model (VLM) and a large language model (LLM) to automatically generate text prompts describing the hybrid area to guide scene generation. Through these steps, SpaceBlender can create a virtual environment that contains both real - world elements familiar to users and is suitable for collaborative tasks, thereby enhancing the user experience of VR telepresence. ### Preliminary evaluation To evaluate whether the environment generated by SpaceBlender is suitable for collaborative tasks, the researchers conducted a preliminary user experiment. Twenty participants were divided into groups and completed a VR - based affinity diagramming task in three different virtual environments: 1. **Generic3D**: A general - purpose low - polygon room. 2. **Text2Room**: An environment generated by an existing generative model. 3. **SpaceBlender**: An environment that integrates images of familiar physical locations provided by participants. The experimental results show that participants experienced higher physical comfort and navigability in the SpaceBlender environment, and some participants completed the task by using recognizable environmental features. However, they also pointed out that the visual quality and realism of the generated environment need to be improved to better support future application scenarios. In conclusion, SpaceBlender provides a new solution for creating virtual collaborative spaces that integrate the user's physical background and lays the foundation for further research on the application of generative AI tools in hybrid environments.