PointAD: Comprehending 3D Anomalies from Points and Pixels for Zero-shot 3D Anomaly Detection

Qihang Zhou,Jiangtao Yan,Shibo He,Wenchao Meng,Jiming Chen
2024-10-28
Abstract:Zero-shot (ZS) 3D anomaly detection is a crucial yet unexplored field that addresses scenarios where target 3D training samples are unavailable due to practical concerns like privacy protection. This paper introduces PointAD, a novel approach that transfers the strong generalization capabilities of CLIP for recognizing 3D anomalies on unseen objects. PointAD provides a unified framework to comprehend 3D anomalies from both points and pixels. In this framework, PointAD renders 3D anomalies into multiple 2D renderings and projects them back into 3D space. To capture the generic anomaly semantics into PointAD, we propose hybrid representation learning that optimizes the learnable text prompts from 3D and 2D through auxiliary point clouds. The collaboration optimization between point and pixel representations jointly facilitates our model to grasp underlying 3D anomaly patterns, contributing to detecting and segmenting anomalies of unseen diverse 3D objects. Through the alignment of 3D and 2D space, our model can directly integrate RGB information, further enhancing the understanding of 3D anomalies in a plug-and-play manner. Extensive experiments show the superiority of PointAD in ZS 3D anomaly detection across diverse unseen objects.
Computer Vision and Pattern Recognition,Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is zero - shot (ZS) 3D anomaly detection. Specifically, this problem involves how to detect and segment anomalies in unseen 3D objects without target 3D training samples. Due to limitations such as privacy protection that may exist in practical applications, it is impossible to obtain target 3D training samples, which makes traditional 3D anomaly detection methods that rely on a large amount of labeled data difficult to apply. ### Problem Background 1. **Limitations of Traditional 3D Anomaly Detection**: - Existing 3D anomaly detection methods usually assume that completely normal point cloud data can be obtained and identify anomalies by storing these normal features. - These methods perform poorly when facing new objects or data involving privacy protection because they rely on the normal features of specific objects. 2. **Requirement for Zero - shot 3D Anomaly Detection**: - In many practical scenarios, such as industrial inspection, medical imaging and other fields, it may be impossible to obtain enough normal samples for training. - A method that can effectively detect anomalies in unseen objects without relying on target 3D training samples is needed. ### Core Contributions of the Paper 1. **Proposing the PointAD Framework**: - PointAD is a unified framework that can understand 3D anomalies from both the point cloud and pixel perspectives. - By transferring the powerful generalization ability of CLIP (Contrastive Language - Image Pre - training) to the 3D domain, PointAD can detect and segment 3D anomalies in zero - shot situations. 2. **Multi - view Rendering and Hybrid Representation Learning**: - PointAD converts 3D point clouds into 2D images through multi - view rendering and uses the visual encoder of CLIP to extract 2D representations. - It proposes hybrid representation learning to optimize learnable text prompts to capture general normal and abnormal semantics. 3. **Cross - modal Integration**: - PointAD can directly integrate 2D RGB information to achieve zero - shot multi - modal 3D (M3D) anomaly detection without additional modules or retraining. ### Key Technologies of the Solution 1. **Multi - view Rendering**: - By rendering 3D point clouds from multiple perspectives, 2D images are generated, and local semantic information is preserved. - High - precision rendering techniques are used to ensure the accurate representation of fine - grained abnormal semantics. 2. **Hybrid Representation Learning**: - Combining global and local abnormal semantics, text prompts are optimized so that the model can better understand and generalize abnormal patterns of different objects. - Multi - instance learning (MIL) and multi - task learning (MTL) are used to handle global and local anomalies respectively. 3. **Cross - modal Fusion**: - In the testing phase, RGB images can be directly input, features can be extracted using the 2D branch, and projected back to the 3D space to calculate the anomaly score. ### Experimental Results - PointAD has been verified on multiple public datasets, including MVTec3D - AD, Eyecandies and Real3D - AD. - The experimental results show that PointAD significantly outperforms existing methods in the zero - shot 3D anomaly detection task, especially in terms of cross - dataset generalization ability. In conclusion, this paper aims to solve the challenging but highly valuable problem of zero - shot 3D anomaly detection. It proposes the PointAD framework, which, through multi - view rendering and hybrid representation learning, achieves effective anomaly detection of unseen 3D objects.