Abstract:3D scene understanding is crucial for facilitating seamless interaction between digital devices and the physical world. Real-time capturing and processing of the 3D scene are essential for achieving this seamless integration. While existing approaches typically separate acquisition and processing for each frame, the advent of resolution-scalable 3D sensors offers an opportunity to overcome this paradigm and fully leverage the otherwise wasted acquisition time to initiate processing. In this study, we introduce VX-S3DIS, a novel point cloud dataset accurately simulating the behavior of a resolution-scalable 3D sensor. Additionally, we present RESSCAL3D++, an important improvement over our prior work, RESSCAL3D, by incorporating an update module and processing strategy. By applying our method to the new dataset, we practically demonstrate the potential of joint acquisition and semantic segmentation of 3D point clouds. Our resolution-scalable approach significantly reduces scalability costs from 2% to just 0.2% in mIoU while achieving impressive speed-ups of 15.6 to 63.9% compared to the non-scalable baseline. Furthermore, our scalable approach enables early predictions, with the first one occurring after only 7% of the total inference time of the baseline. The new VX-S3DIS dataset is available at <a class="link-external link-https" href="https://github.com/remcoroyen/vx-s3dis" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: How to achieve real - time capture and processing of 3D point clouds to promote seamless interaction between digital devices and the physical world. Specifically, the paper focuses on jointly acquiring and semantically segmenting 3D point clouds, thereby reducing processing latency and improving efficiency. ### Background Problem Existing methods usually separate the acquisition and processing of each frame, which leads to a waste of resources, especially in applications that require real - time interaction (such as robotics, autonomous driving, etc.). To solve this problem, the paper introduces a new method that uses a resolution - scalable 3D sensor to perform processing while acquiring data, thereby making full use of the acquisition time and reducing the overall reaction time. ### Main Contributions 1. **VX - S3DIS Dataset**: - A brand - new point - cloud dataset VX - S3DIS is introduced. This dataset simulates the behavior of a resolution - scalable 3D sensor and allows semantic processing during the scanning process. 2. **RESSCAL3D++ Method**: - The previous RESSCAL3D method is improved. By introducing an update module and a processing strategy, the scalability cost is significantly reduced from 2% to 0.2% while maintaining the efficiency of the inference time. - Early prediction is achieved, and the first prediction only requires 7% of the total inference time. 3. **Experimental Verification**: - Exhaustive experiments are carried out on two datasets, demonstrating the potential of this method in joint acquisition and processing, especially on the VX - S3DIS dataset, where an inference - time acceleration of 15.6% - 63.9% is obtained. ### Formula Representation - The point - cloud stream \(P\) can be represented as: \[ P=\{P(t_1),\ldots, P(t_{s_1}), P(t_{s_1 + 1}),\ldots, P(t_{s_2}),\ldots\} \] where \(P(t_i)\) is the point obtained at the timestamp \(t_i\). - The \(i\)-th partition \(X_i\in\mathbb{R}^{N_i\times3}\) is represented as: \[ X_i = \{P(t_{s_{i - 1}+1}),\ldots, P(t_{s_i})\} \] - Mathematical expression of the update module: \[ Y^{(s_{i+2})}_i=UM(Y^{(s_{i+1})}_i, Y^{(s_{i+2})}_{i+1}) = UM(UM(Y^{(s_i)}_i, Y^{(s_{i+1})}_{i+1}), UM(Y^{(s_{i+1})}_{i+1}, Y^{(s_{i+2})}_{i+2})) \] where \(UM\) represents the update module, and \(K\)-nearest - neighbor voting is used to refine the prediction. ### Summary This paper solves the key problems in real - time acquisition and processing of 3D point clouds by introducing the VX - S3DIS dataset and the improved RESSCAL3D++ method, significantly improving the processing efficiency and reducing the latency.

RESSCAL3D++: Joint Acquisition and Semantic Segmentation of 3D Point Clouds

Pass3d: Precise And Accelerated Semantic Segmentation For 3d Point Cloud

RESSCAL3D: Resolution Scalable 3D Semantic Segmentation of Point Clouds

3D Object Segmentation Using Cross-Window Point Transformer with Latent Semantic Boundary Guidance

MarS3D: A Plug-and-Play Motion-Aware Model for Semantic Segmentation on Multi-Scan 3D Point Clouds

A Unified Framework for 3D Point Cloud Visual Grounding

Deep Projective 3D Semantic Segmentation

Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance

Scalable 3D Panoptic Segmentation As Superpoint Graph Clustering

Real-Time Semantic Segmentation of LiDAR Point Clouds on Edge Devices for Unmanned Systems

SEGCloud: Semantic Segmentation of 3D Point Clouds

Exploiting Local Features and Range Images for Small Data Real-Time Point Cloud Semantic Segmentation

Multi-view Incremental Segmentation of 3D Point Clouds for Mobile Robots

Augmented Efficiency: Reducing Memory Footprint and Accelerating Inference for 3D Semantic Segmentation through Hybrid Vision

Point-SAM: Promptable 3D Segmentation Model for Point Clouds

Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges

EmbodiedSAM: Online Segment Any 3D Thing in Real Time

Uplifting Range-View-based 3D Semantic Segmentation in Real-Time with Multi-Sensor Fusion

EPRecon: An Efficient Framework for Real-Time Panoptic 3D Reconstruction from Monocular Video

Associatively Segmenting Instances and Semantics in Point Clouds

VIASEG: Visual Information Assisted Lightweight Point Cloud Segmentation.