PanoTree: Autonomous Photo-Spot Explorer in Virtual Reality Scenes

Tomohiro Hayase,Sacha Braun,Hikari Yanagawa,Itsuki Orito,Yuichi Hiroi
2024-06-12
Abstract:Social VR platforms enable social, economic, and creative activities by allowing users to create and share their own virtual spaces. In social VR, photography within a VR scene is an important indicator of visitors' activities. Although automatic identification of photo spots within a VR scene can facilitate the process of creating a VR scene and enhance the visitor experience, there are challenges in quantitatively evaluating photos taken in the VR scene and efficiently exploring the large VR scene. We propose PanoTree, an automated photo-spot explorer in VR scenes. To assess the aesthetics of images captured in VR scenes, a deep scoring network is trained on a large dataset of photos collected by a social VR platform to determine whether humans are likely to take similar photos. Furthermore, we propose a Hierarchical Optimistic Optimization (HOO)-based search algorithm to efficiently explore 3D VR spaces with the reward from the scoring network. Our user study shows that the scoring network achieves human-level performance in distinguishing randomly taken images from those taken by humans. In addition, we show applications using the explored photo spots, such as automatic thumbnail generation, support for VR world creation, and visitor flow planning within a VR scene.
Computer Vision and Pattern Recognition,Graphics
What problem does this paper attempt to address?
This paper attempts to solve the problem of automatically identifying and exploring the best photo - taking points in virtual reality (VR) scenes. Specifically, it faces two main challenges: 1. **Quantitative evaluation of photographed photos**: It is necessary to develop a method to evaluate whether the images captured in VR scenes are likely to be taken by humans. This involves the quantitative evaluation of the aesthetic quality of images. 2. **Efficient exploration of the entire VR scene**: Since the space of VR scenes is very large and the position and orientation of the camera have 6 degrees of freedom (DoF), how to efficiently search these spaces to find the best photo - taking points is a computationally intensive task. To solve these problems, the paper proposes PanoTree, an automated photo - taking point exploration system based on deep learning and the Hierarchical Optimistic Optimization (HOO) algorithm. The following is a summary of the specific content: ### Problem background - **Social VR platforms**: Social VR platforms allow users to create and share their own virtual spaces, and photography has become an important indicator for measuring visitor activities. - **Challenges**: Automatically identifying photo - taking points in VR scenes can simplify the VR scene creation process and enhance the visitor experience, but there are challenges in how to quantitatively evaluate the quality of photos in VR scenes and efficiently explore large - scale VR scenes. ### Solutions #### 1. Scoring Network - **Training data**: It is trained using a large - scale photo dataset collected from the social VR platform Cluster. - **Function**: This network can predict whether the input image is likely to be taken by a human, thereby evaluating the aesthetic quality of the image. #### 2. Hierarchical Optimistic Optimization (HOO) algorithm - **Space division**: The VR scene is hierarchically divided into multiple sub - regions to gradually narrow the search range. - **Exploration strategy**: Using the score output by the scoring network as a reward signal, the photo - taking points in 3D space are efficiently explored through the method of the multi - armed bandit problem. ### Main contributions - **Proposing PanoTree**: A system that can autonomously explore photo - taking points in VR scenes. - **Developing a scoring network**: A deep neural network trained based on a large - scale real - world photo dataset for evaluating the aesthetic quality of photos in VR scenes. - **Efficient hierarchical search algorithm**: Efficient spatial hierarchical search for photo - taking point exploration is achieved through implicit supervision. - **Prototype implementation and evaluation**: A prototype of PanoTree search is implemented, the exploration quality is discussed, and potential application scenarios are provided. ### Formula explanation Some key formulas involved in the description are as follows: - **Scoring network output**: \[ f_S: \mathbb{R}^{H \times W \times C} \rightarrow [0, 1] \] where \( H \) and \( W \) represent the height and width of the image respectively, and \( C \) represents the number of channels. - **Upper Confidence Bound (UCB)**: \[ U_{d,i}(n) = \hat{\mu}_{d,i}(n) + c_s \sqrt{\frac{2 \log N}{T_{d,i}(n)}} + \nu_1 \rho^d \] where \( c_s > 0\), \(\nu_1 > 0\), \(0 < \rho \leq 1\) are hyperparameters. - **B - value update**: \[ B_{d,i}(n) = \min \left\{ U_{d,i}(n), \max \left\{ B_{d + 1,2i}(n), B_{d + 1,2i + 1}(n) \right\} \right\} \] These formulas ensure the effectiveness and accuracy of the scoring network and the exploration algorithm.