ADs: Active Data-sharing for Data Quality Assurance in Advanced Manufacturing Systems

Yue Zhao,Yuxuan Li,Chenang Liu,Yinan Wang
2024-03-31
Abstract:Machine learning (ML) methods are widely used in industrial applications, which usually require a large amount of training data. However, data collection needs extensive time costs and investments in the manufacturing system, and data scarcity commonly exists. Therefore, data-sharing is widely enabled among multiple machines with similar functionality to augment the dataset for building ML methods. However, distribution mismatch inevitably exists in their data due to different working conditions, while the ML methods are assumed to be built and tested on the dataset following the same distribution. Thus, an Active Data-sharing (ADs) framework is proposed to ensure the quality of the shared data among multiple machines. It is designed to simultaneously select the most informative data points benefiting the downstream tasks and mitigate the distribution mismatch among all selected data points. The proposed method is validated on anomaly detection on in-situ monitoring data from three additive manufacturing processes.
Machine Learning
What problem does this paper attempt to address?
The paper primarily focuses on addressing the issue of effective data sharing to ensure the quality and performance of machine learning (ML) models in advanced manufacturing systems, especially under conditions of data scarcity and distribution mismatch. Specifically, the paper introduces a framework named "Active Data sharing (ADs)," aimed at ensuring the quality of data shared among multiple machines with similar functions, thereby enhancing the performance of ML methods. In the manufacturing industry, data scarcity is a common problem due to the high cost and time-consuming nature of data collection. However, with the development of the Industrial Internet of Things (IIoT), data sharing between different machines has become possible, which can augment the datasets used for building ML models. Although these machines are designed similarly, their data distributions inevitably differ due to variations in working conditions, process parameters, measurement noise, etc. Effective application of ML methods typically assumes that training and testing data come from the same distribution, thus necessitating an intelligent data sharing framework to ensure the quality of shared data, sharing only beneficial information to improve the performance of ML methods. The ADs framework is designed as a self-supervised learning framework, combining the architectures of Contrastive Learning (CL) and Active Learning (AL). It develops a novel acquisition function for AL, integrating measures of informativeness for downstream tasks with similarity scores for data quality assurance. Experimental results show that the ADs framework can intelligently share monitoring data between the same machines while excluding data points from different machines during the training of ML methods. With the high-quality augmented dataset generated by the proposed framework, ML methods can achieve 95.78% accuracy using only 26% of labeled data, which is a 1.41% improvement over the baseline method using 100% labeled data. In summary, the paper's objective is to develop an intelligent data sharing framework capable of selecting the most informative data subset for downstream tasks while considering data quality and distribution mismatch, and reducing the impact of low-quality data. The ADs framework combines the advantages of AL and CL by optimizing two objectives: selecting the most beneficial data points for downstream tasks and ensuring that the selected data follows the target distribution to achieve this goal.