Text2LiDAR: Text-guided LiDAR Point Cloud Generation via Equirectangular Transformer

Yang Wu,Kaihua Zhang,Jianjun Qian,Jin Xie,Jian Yang
2024-07-29
Abstract:The complex traffic environment and various weather conditions make the collection of LiDAR data expensive and challenging. Achieving high-quality and controllable LiDAR data generation is urgently needed, controlling with text is a common practice, but there is little research in this field. To this end, we propose Text2LiDAR, the first efficient, diverse, and text-controllable LiDAR data generation model. Specifically, we design an equirectangular transformer architecture, utilizing the designed equirectangular attention to capture LiDAR features in a manner with data characteristics. Then, we design a control-signal embedding injector to efficiently integrate control signals through the global-to-focused attention mechanism. Additionally, we devise a frequency modulator to assist the model in recovering high-frequency details, ensuring the clarity of the generated point cloud. To foster development in the field and optimize text-controlled generation performance, we construct nuLiDARtext which offers diverse text descriptors for 34,149 LiDAR point clouds from 850 scenes. Experiments on uncontrolled and text-controlled generation in various forms on KITTI-360 and nuScenes datasets demonstrate the superiority of our approach.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the efficient, high - quality and controllable LiDAR point cloud generation problem. Specifically, the author points out that due to the complex traffic environment and changeable weather conditions, collecting LiDAR data is both expensive and challenging. Therefore, achieving high - quality and controllable LiDAR data generation has become very urgent. Although using text - controlled generation is a common practice, there is little research in this field. For this reason, the author proposes Text2LiDAR, which is the first efficient, diverse and text - controllable LiDAR data generation model. ### Main contributions of the paper: 1. **Propose the first effective text - controllable LiDAR point cloud generation framework**: Text2LiDAR fully considers and adapts to the physical characteristics of the equirectangular image. 2. **Design two novel modules**: the Control Signal Embedding Injector (CEI) and the Frequency Modulator (FM). The CEI gradually and robustly fuses the control signal with the dominant features through a global - to - local attention mechanism, while the FM solves the smooth characteristics of the equirectangular image, assists in model training and improves the generation quality. 3. **Construct high - quality text - LiDAR data pairs**: nuLiDARtext contains 34,149 pairs of text - LiDAR data, covering 850 scenes, which enhances the reliability of the generation results and the progress in the field. ### Key problems solved: 1. **Lack of a controllable generation architecture applicable to equirectangular images and text**: Existing methods mainly rely on convolutional denoising architectures (such as U - Net), which have limitations when dealing with equirectangular images, are unable to effectively capture long - distance relationships, and are not convenient for adapting to control signals of different modalities. 2. **Lack of reliable text - LiDAR paired data**: High - quality paired data not only need to describe the main objects in the LiDAR point cloud, but also need to contain diverse scenes regarding weather, lighting, vehicle posture and environmental structure to form a comprehensive description. Current mainstream datasets cannot provide such high - quality paired data. ### Technical details: - **Equirectangular image Transformer network**: Design equirectangular image Attention (EA) and Reverse Equirectangular image Attention (REA) for feature extraction and up - sampling, which can capture long - distance relationships between any two points and adapt to the circular structure of the equirectangular image. - **Control Signal Embedding Injector (CEI)**: Through a global - to - local attention mechanism, gradually fuse the control signal with the dominant features, enhancing the text - controllable ability of the model. - **Frequency Modulator (FM)**: Solve the smooth characteristics of the equirectangular image and ensure that the details of the generated point cloud are clear. ### Experimental verification: - **Uncontrolled generation experiment**: Conducted uncontrolled generation experiments on the KITTI - 360 and nuScenes datasets, and the results show that Text2LiDAR is superior to existing methods in multiple evaluation metrics. - **LiDAR point cloud densification experiment**: Verified the effectiveness and practicality of the model by densifying the existing sparse LiDAR point clouds. In conclusion, this paper effectively solves the problem of high - quality and controllable LiDAR point cloud generation by proposing the Text2LiDAR model, and provides important technical basis and data support for the development of this field.