Abstract:We introduce AlphaTablets, a novel and generic representation of 3D planes that features continuous 3D surface and precise boundary delineation. By representing 3D planes as rectangles with alpha channels, AlphaTablets combine the advantages of current 2D and 3D plane representations, enabling accurate, consistent and flexible modeling of 3D planes. We derive differentiable rasterization on top of AlphaTablets to efficiently render 3D planes into images, and propose a novel bottom-up pipeline for 3D planar reconstruction from monocular videos. Starting with 2D superpixels and geometric cues from pre-trained models, we initialize 3D planes as AlphaTablets and optimize them via differentiable rendering. An effective merging scheme is introduced to facilitate the growth and refinement of AlphaTablets. Through iterative optimization and merging, we reconstruct complete and accurate 3D planes with solid surfaces and clear boundaries. Extensive experiments on the ScanNet dataset demonstrate state-of-the-art performance in 3D planar reconstruction, underscoring the great potential of AlphaTablets as a generic 3D plane representation for various applications. Project page is available at: <a class="link-external link-https" href="https://hyzcluster.github.io/alphatablets" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

This paper attempts to solve the problem of 3D plane reconstruction from monocular videos. Specifically, the author aims to overcome the limitations of existing methods when dealing with large - scale and complex scenes, including problems such as inconsistent geometric representations, and difficulties in texture and boundary modeling. To achieve this goal, the author proposes a novel and general 3D plane representation method named AlphaTablets. ### Main Problems and Challenges 1. **Limitations of Traditional Methods**: - Traditional 3D plane reconstruction methods rely on explicit geometric inputs, hand - crafted features, strong assumptions, and solvers, and these methods have limitations in terms of scalability and robustness. - Learning - driven methods can directly segment and regress plane parameters from a single or sparse - view images, but they perform poorly when dealing with sparse - view image sequences and are difficult to scale to complex scenes. 2. **Deficiencies of Existing Representation Methods**: - The 2D mask representation method can accurately depict the plane contour, but there are inconsistencies between different viewing angles, and a complex matching and fusion process is required to reconstruct the 3D surface. - 3D representation methods (such as point clouds, surfels, etc.) directly depict the 3D plane surface, but due to discrete sampling, the geometry and texture are discontinuous, and it is difficult to accurately model complex plane boundaries. ### Solution: AlphaTablets To solve the above problems, the author proposes AlphaTablets, a new 3D plane representation method that represents 3D planes as rectangles with alpha channels. AlphaTablets combines the advantages of 2D and 3D plane representations and can flexibly model the geometric structure, texture, and boundary of 3D planes while maintaining efficiency and consistency. ### Specific Contributions 1. **Proposing AlphaTablets**: - AlphaTablets represents 3D planes by defining rectangles with alpha channels, providing a natural description of irregular boundaries and a continuous solid 3D surface representation. - The rasterization formula of AlphaTablets is derived to achieve differentiable rendering, so that 3D planes can be efficiently rendered in images. 2. **Constructing a 3D Plane Reconstruction System Based on AlphaTablets**: - A bottom - up 3D plane reconstruction pipeline is proposed, which is initialized with pre - trained monocular cues, optimizes the geometry, texture, and alpha channels through differentiable rendering, and introduces an effective merging mechanism to promote the formation of larger and more complete planes. 3. **Performance Improvement**: - Extensive experiments on the ScanNet dataset show that this method achieves state - of - the - art performance in the 3D plane reconstruction task and demonstrates its great potential as a general 3D plane representation, which is suitable for various subsequent applications. ### Summary By introducing AlphaTablets, this paper solves the problems of inconsistent geometric representations, and difficulties in texture and boundary modeling in existing 3D plane reconstruction methods, and provides a more accurate, complete, and generalized 3D plane reconstruction system.

AlphaTablets: A Generic Plane Representation for 3D Planar Reconstruction from Monocular Videos

A Submap Joining Algorithm for 3D Reconstruction Using an RGB-D Camera Based on Point and Plane Features

In-Hand 3D Object Reconstruction from a Monocular RGB Video

MonoPlane: Exploiting Monocular Geometric Cues for Generalizable 3D Plane Reconstruction

PlanarRecon: Realtime 3D Plane Detection and Reconstruction from Posed Monocular Videos

Mobile3DRecon: Real-time Monocular 3D Reconstruction on a Mobile Phone

Mobile3DScanner: an Online 3D Scanner for High-quality Object Reconstruction with a Mobile Device

Real-time dense 3D reconstruction and camera tracking via embedded planes representation

UniPlane: Unified Plane Detection and Reconstruction from Posed Monocular Videos

Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers

Tri$^{2}$-plane: Thinking Head Avatar via Feature Pyramid

PlaneMVS: 3D Plane Reconstruction from Multi-View Stereo

Real-time Dense Reconstruction of Tissue Surface from Stereo Optical Video

SLAM3R: Real-Time Dense Scene Reconstruction from Monocular RGB Videos

Planar Reconstruction of Indoor Scenes from Sparse Views and Relative Camera Poses

Real-Time Trust Region Ground Plane Segmentation For Monocular Mobile Robots

R4D-planes: Remapping Planes for Novel View Synthesis and Self-Supervised Decoupling of Monocular Videos

TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers

Three-dimensional point cloud plane segmentation in both structured and unstructured environments

PlaneFusion: Real-Time Indoor Scene Reconstruction With Planar Prior

Tabletop Transparent Scene Reconstruction via Epipolar-Guided Optical Flow with Monocular Depth Completion Prior