AlphaTablets: A Generic Plane Representation for 3D Planar Reconstruction from Monocular Videos

Yuze He,Wang Zhao,Shaohui Liu,Yubin Hu,Yushi Bai,Yu-Hui Wen,Yong-Jin Liu
2024-11-30
Abstract:We introduce AlphaTablets, a novel and generic representation of 3D planes that features continuous 3D surface and precise boundary delineation. By representing 3D planes as rectangles with alpha channels, AlphaTablets combine the advantages of current 2D and 3D plane representations, enabling accurate, consistent and flexible modeling of 3D planes. We derive differentiable rasterization on top of AlphaTablets to efficiently render 3D planes into images, and propose a novel bottom-up pipeline for 3D planar reconstruction from monocular videos. Starting with 2D superpixels and geometric cues from pre-trained models, we initialize 3D planes as AlphaTablets and optimize them via differentiable rendering. An effective merging scheme is introduced to facilitate the growth and refinement of AlphaTablets. Through iterative optimization and merging, we reconstruct complete and accurate 3D planes with solid surfaces and clear boundaries. Extensive experiments on the ScanNet dataset demonstrate state-of-the-art performance in 3D planar reconstruction, underscoring the great potential of AlphaTablets as a generic 3D plane representation for various applications. Project page is available at: <a class="link-external link-https" href="https://hyzcluster.github.io/alphatablets" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
This paper attempts to solve the problem of 3D plane reconstruction from monocular videos. Specifically, the author aims to overcome the limitations of existing methods when dealing with large - scale and complex scenes, including problems such as inconsistent geometric representations, and difficulties in texture and boundary modeling. To achieve this goal, the author proposes a novel and general 3D plane representation method named AlphaTablets. ### Main Problems and Challenges 1. **Limitations of Traditional Methods**: - Traditional 3D plane reconstruction methods rely on explicit geometric inputs, hand - crafted features, strong assumptions, and solvers, and these methods have limitations in terms of scalability and robustness. - Learning - driven methods can directly segment and regress plane parameters from a single or sparse - view images, but they perform poorly when dealing with sparse - view image sequences and are difficult to scale to complex scenes. 2. **Deficiencies of Existing Representation Methods**: - The 2D mask representation method can accurately depict the plane contour, but there are inconsistencies between different viewing angles, and a complex matching and fusion process is required to reconstruct the 3D surface. - 3D representation methods (such as point clouds, surfels, etc.) directly depict the 3D plane surface, but due to discrete sampling, the geometry and texture are discontinuous, and it is difficult to accurately model complex plane boundaries. ### Solution: AlphaTablets To solve the above problems, the author proposes AlphaTablets, a new 3D plane representation method that represents 3D planes as rectangles with alpha channels. AlphaTablets combines the advantages of 2D and 3D plane representations and can flexibly model the geometric structure, texture, and boundary of 3D planes while maintaining efficiency and consistency. ### Specific Contributions 1. **Proposing AlphaTablets**: - AlphaTablets represents 3D planes by defining rectangles with alpha channels, providing a natural description of irregular boundaries and a continuous solid 3D surface representation. - The rasterization formula of AlphaTablets is derived to achieve differentiable rendering, so that 3D planes can be efficiently rendered in images. 2. **Constructing a 3D Plane Reconstruction System Based on AlphaTablets**: - A bottom - up 3D plane reconstruction pipeline is proposed, which is initialized with pre - trained monocular cues, optimizes the geometry, texture, and alpha channels through differentiable rendering, and introduces an effective merging mechanism to promote the formation of larger and more complete planes. 3. **Performance Improvement**: - Extensive experiments on the ScanNet dataset show that this method achieves state - of - the - art performance in the 3D plane reconstruction task and demonstrates its great potential as a general 3D plane representation, which is suitable for various subsequent applications. ### Summary By introducing AlphaTablets, this paper solves the problems of inconsistent geometric representations, and difficulties in texture and boundary modeling in existing 3D plane reconstruction methods, and provides a more accurate, complete, and generalized 3D plane reconstruction system.