Virtual-real Occlusion Handling Technologies in Augmented Reality
Wu Yuhui,Li Xiaojuan,Liu Yue
DOI: https://doi.org/10.11834/jig.240045
2024-01-01
Abstract:With the rapid development of software technology and the continuous updating of hardware devices,augmented reality technology has gradually matured and been widely used in various fields,such as military,medical,gaming,indus-try,and education.Accurate depth perception is crucial in augmented reality,and simply overlaying virtual objects onto video sequences no longer meets user demands.In many augmented reality scenarios,users need to interact with virtual objects constantly,and without accurate depth perception,augmented reality can hardly provide a seamless interactive experience.Virtual-real occlusion handling is one of the key factors to achieve this goal.It presents a realistic virtual-real fusion effect by establishing accurate occlusion relationship,so that the fusion scene can correctly reflect the spatial posi-tion relationship between virtual and real objects,thereby enhancing the user's sense of immersion and realism.This paper first introduces the related background,concepts,and overall processing flow of virtual-real occlusion handling.Existing occlusion handling methods can be divided into three categories:depth based,image analysis based,and model based.By analyzing the distinct characteristics of rigid and nonrigid objects,we summarize the specific principles,repre-sentative research works,and the applicability to rigid and nonrigid objects of these three virtual-real occlusion handling methods.The shape and size of rigid objects remain unchanged after motion or force,and they mainly use two types virtual-real occlusion handling methods:depth based and model based.The depth-based methods have evolved from the early use of stereo vision algorithms to the use of depth sensors for indoor depth image acquisition and further to the prediction of mov-ing objects'depth by using outdoor map data,as well as the densification of sparse simultaneous localization and mapping depth in monocular mobile augmented reality.Further research should focus on the depth image restoration algorithms and the balance between real-time performance and accuracy of scene-dense depth computation algorithms in mobile augmented reality.The model-based methods have developed from constructing partial 3D models by segmenting object contours in video key frames or directly using modeling software to achieving dense reconstruction of indoor static scenes using depth images and constructing approximate 3D models of outdoor scenes by incorporating geographic spatial information.Model-based methods already have a relative well-established processing flow,but further exploration is still needed on how to enhance real-time performance while ensuring tracking and occlusion accuracy.In contrast to rigid objects,nonrigid objects are prone to irregular deformations during movement.Typical nonrigid objects in augmented reality are user's hands or the bodies of other users.For nonrigid objects,related research has been conducted on all three types virtual-real occlusion handling methods.Depth-based methods focus on the depth image restoration algorithms.These algorithms aim to repair depth image noise while ensuring precise alignment between depth and RGB image,especially in extreme sce-narios,such as when foreground and background have similar colors.Image analysis-based methods focus on foreground segmentation algorithms and occlusion relationship judgment means.Foreground segmentation algorithms have evolved from the early color models and background subtraction techniques to the deep learning-based segmentation networks.Moreover,the occlusion relationship judgment means have transitioned from user-specified to incorporating depth informa-tion to assist judgment.The key challenge in image analysis-based methods lies in overcoming the irregular deformations of nonrigid objects,obtaining accurate foreground segmentation masks and tracking continuously.Model-based methods ini-tially used LeapMotion combined with customized hand parameters to fit hand model,but now using deep learning networks to reconstruct hand models has become mainstream.Model-based methods should improve the speed and accuracy of hand reconstruction.On the basis of summarizing the virtual-real occlusion handling methods for rigid and nonrigid objects,we also conduct a comparative analysis of existing methods from various perspectives including real-time performance,automa-tion level,whether to support perspective or scene changes,and application scope.In addition,we summarize the specific workflows,difficulties and limitations of the three virtual-real occlusion handling methods.Finally,aiming at the problems existing in related research,we explore the challenges faced by current virtual-real occlusion technology and propose poten-tial future research directions:1)Occlusion handling for moving nonrigid objects.Obtaining accurate depth or 3D models of nonrigid objects is the key to solving this problem.The accuracy and robustness of hand segmentation must be further improved.Additionally,the use of simpler monocular depth estimation and rapid reconstruction of nonrigid objects other than user's hands need to be further explored.2)Occlusion handling for outdoor dynamic scenes.Existing depth cameras have limited working range,which makes them ineffective in outdoor scenes.Sparse 3D models obtained from geographic information systems have low precision and cannot be applied to dynamic objects,such as automobiles.Therefore,further research on dynamic objects'virtual-real occlusion handling in large outdoor scenes is needed.3)Registration algorithms for depth and RGB images.The accuracy of edge alignment between depth and color images must be improved without con-suming too much computing resources.