Large-Scale Visual Language Model Boosted by Contrast Domain Adaptation for Intelligent Industrial Visual Monitoring

Huan Wang,Chenxi Li,Yan-Fu Li
DOI: https://doi.org/10.1109/tii.2024.3441638
IF: 12.3
2024-01-01
IEEE Transactions on Industrial Informatics
Abstract:Industrial visual monitoring (IVM) is crucial in enhancing the reliability and efficiency of manufacturing processes. Recently, large vision-language models (LVLMs) have demonstrated remarkable semantic understanding and natural language interaction capabilities, which provide a novel solution to IVM. However, LVLMs pretrained on common domains lack specific knowledge for IVM scenarios, causing insufficient adaptation to industrial image patterns and specialized textual corpora. In this article, we deeply studied the adaptation of LVLMs to IVM and proposed DefectGLM. First, we proposed the first large-scale multimodal wafer dataset as a reliable data basis for model domain generalization. Second, this model employs low-rank adaptation-based contrast visual adaptation to align with industrial image patterns and utilizes vision-language instruction tuning for professional knowledge alignment. DefectGLM is the first large-model-based wafer image recognition model, and can accurately identify 36 types of wafer defects and provide appropriate text descriptions. DefectGLM provides a new solution for the development of industrial large models.
What problem does this paper attempt to address?