Scene Depth Estimation from Traditional Oriental Landscape Paintings

Sungho Kang,YeongHyeon Park,Hyunkyu Park,Juneho Yi

2024-03-07

Abstract:Scene depth estimation from paintings can streamline the process of 3D sculpture creation so that visually impaired people appreciate the paintings with tactile sense. However, measuring depth of oriental landscape painting images is extremely challenging due to its unique method of depicting depth and poor preservation. To address the problem of scene depth estimation from oriental landscape painting images, we propose a novel framework that consists of two-step Image-to-Image translation method with CLIP-based image matching at the front end to predict the real scene image that best matches with the given oriental landscape painting image. Then, we employ a pre-trained SOTA depth estimation model for the generated real scene image. In the first step, CycleGAN converts an oriental landscape painting image into a pseudo-real scene image. We utilize CLIP to semantically match landscape photo images with an oriental landscape painting image for training CycleGAN in an unsupervised manner. Then, the pseudo-real scene image and oriental landscape painting image are fed into DiffuseIT to predict a final real scene image in the second step. Finally, we measure depth of the generated real scene image using a pre-trained depth estimation model such as MiDaS. Experimental results show that our approach performs well enough to predict real scene images corresponding to oriental landscape painting images. To the best of our knowledge, this is the first study to measure the depth of oriental landscape painting images. Our research potentially assists visually impaired people in experiencing paintings in diverse ways. We will release our code and resulting dataset.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper attempts to address the problem of estimating scene depth from traditional Eastern landscape paintings. Specifically, directly applying pre-trained state-of-the-art (SOTA) depth estimation models to images of Eastern landscape paintings usually results in meaningless depth maps because these models are typically trained on real scene images. Eastern landscape paintings have unique techniques for depicting depth, such as the "Three Distances Method," and due to their historical age, many Eastern landscape paintings are poorly preserved, leading to blurred edges and weak contrast. These issues make it difficult to directly apply existing depth estimation models to Eastern landscape paintings. To overcome this challenge, the authors propose a novel framework that predicts the real scene image that best matches a given Eastern landscape painting image through a two-step image-to-image (I2I) translation method. First, CycleGAN is used to convert the Eastern landscape painting image into a pseudo-real scene image; then, DiffuseIT is utilized to predict the final real scene image by combining the original Eastern landscape painting image and the generated pseudo-real scene image. Finally, a pre-trained depth estimation model (such as MiDaS) is used to measure the depth of the generated real scene image. This method aims to support the tactile appreciation of paintings by visually impaired individuals and also provides possibilities for the automated 3D sculpture creation process.

Scene Depth Estimation from Traditional Oriental Landscape Paintings

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

Depth Generation Network: Estimating Real World Depth From Stereo And Depth Images

Depth Map Inpainting Using a Fully Convolutional Network

Depth Estimation of Traffic Scenes from Image Sequence Using Deep Learning.

OccDepth: A Depth-Aware Method for 3D Semantic Scene Completion

Single Image Based Three-Dimensional Scene Reconstruction Using Semantic and Geometric Priors

DiverseDepth: Affine-invariant Depth Prediction Using Diverse Data

Depth Insight -- Contribution of Different Features to Indoor Single-image Depth Estimation

Depth-Aware Endoscopic Video Inpainting

Towards Accurate Reconstruction of 3D Scene Shape From A Single Monocular Image

Learning to Recover 3D Scene Shape from a Single Image

Depth Estimation from Multi-Scale SLIC Superpixels Using Non-Parametric Learning

FS-Depth: Focal-and-Scale Depth Estimation from a Single Image in Unseen Indoor Scene

Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting

ScaleDepth: Decomposing Metric Depth Estimation into Scale Prediction and Relative Depth Estimation

Behind the Veil: Enhanced Indoor 3D Scene Reconstruction with Occluded Surfaces Completion

Single Image Depth Estimation with Normal Guided Scale Invariant Deep Convolutional Fields

Synthetic Depth Transfer for Monocular 3D Object Pose Estimation in the Wild.

Space Narrative: Generating Images and 3D Scenes of Chinese Garden from Text using Deep Learning

Self-Supervised Learning based Depth Estimation from Monocular Images