FabGPT: An Efficient Large Multimodal Model for Complex Wafer Defect Knowledge Queries

Yuqi Jiang,Xudong Lu,Qian Jin,Qi Sun,Hanming Wu,Cheng Zhuo
2024-07-15
Abstract:Intelligence is key to advancing integrated circuit (IC) fabrication. Recent breakthroughs in Large Multimodal Models (LMMs) have unlocked unparalleled abilities in understanding images and text, fostering intelligent fabrication. Leveraging the power of LMMs, we introduce FabGPT, a customized IC fabrication large multimodal model for wafer defect knowledge query. FabGPT manifests expertise in conducting defect detection in Scanning Electron Microscope (SEM) images, performing root cause analysis, and providing expert question-answering (Q&A) on fabrication processes. FabGPT matches enhanced multimodal features to automatically detect minute defects under complex wafer backgrounds and reduce the subjectivity of manual threshold settings. Besides, the proposed modulation module and interactive corpus training strategy embed wafer defect knowledge into the pre-trained model, effectively balancing Q&A queries related to defect knowledge and original knowledge and mitigating the modality bias issues. Experiments on in-house fab data (SEM-WaD) show that our FabGPT achieves significant performance improvement in wafer defect detection and knowledge querying.
Computer Vision and Pattern Recognition,Artificial Intelligence,Hardware Architecture,Machine Learning
What problem does this paper attempt to address?
This paper attempts to solve the problem of wafer defect knowledge query in the integrated circuit (IC) manufacturing process. Specifically, although existing large - scale multimodal models (LMMs) perform well in visual tasks in basic scenarios, in the complex field of IC manufacturing, these models lack sufficient sensitivity to domain - specific knowledge, resulting in their inefficiency in wafer defect detection and related knowledge query. In addition, existing methods also have difficulties in small defect detection and comprehensive question - answering tasks. Especially when dealing with defects in complex backgrounds and when user queries are not closely related to visual inputs, the model is prone to generate responses biased towards visual content, namely "modal bias". To address these problems, the paper proposes an efficient large - scale multimodal model - FabGPT. FabGPT solves the above problems in the following ways: 1. **Modal Enhancement**: Improve the semantic features related to defects and reduce the influence of irrelevant features through the prediction module (PM) and two enhancement branches. 2. **Defect Detection**: Design a detection head that can automatically learn the specific threshold of each pixel and generate a pixel - level mask, thereby accurately detecting defects in a complex wafer background. 3. **Question - Answering Stage**: Introduce a modulation module and an interactive corpus training strategy to balance the learning of new and old knowledge and alleviate the modal bias problem in the dialogue. The experimental results show that in the supervised detection task, FabGPT has an image - level accuracy of 91.81%, a pixel - level accuracy of 95.61%, a PRO of 88.17%, and an AP of 85.80%. In the question - answering task, the accuracy of FabGPT reaches 96.86%, which is significantly better than the baseline model.