The Applications of 3D Input Data and Scalability Element by Transformer Based Methods: A Review

Abubakar Sulaiman Gezawa,Chibiao Liu,Naveed Ur Rehman Junejo,Haruna Chiroma
DOI: https://doi.org/10.1007/s11831-024-10108-4
IF: 9.7
2024-04-25
Archives of Computational Methods in Engineering
Abstract:Outstanding effectiveness of transformers in visual tasks has resulted in its fast growth and adoption in three dimensions (3D) vision tasks. Vision transformers have shown numerous advantages over earlier convolutional neural network (CNN) architectures including broad modelling abilities, more substantial modelling capabilities, convolution complementarity, scalability to model data size, and better connection for enhancing the performance records of many visual tasks. We present thorough review that classifies and summarizes the popular transformer-based approaches based on key features for transformer integration such as the input data, scalability element that enables transformer processing, architectural design, and context level through which the transformer functions as well as a highlight of the primary contributions of each transformer approach. Furthermore, we compare the results of these techniques with commonly employed non-transformer techniques in 3D object classification, segmentation, and object detection using standard 3D datasets including ModelNet, SUN RGB-D, ScanNet, nuScenes, Waymo, ShapeNet, S3DIS, and KITTI. This study also includes the discussion of numerous potential future options and limitation for 3D vision transformers.
computer science, interdisciplinary applications,engineering, multidisciplinary,mathematics
What problem does this paper attempt to address?