Machine learning-based techniques for fault diagnosis in the semiconductor manufacturing process: a comparative study
Abubakar Abdussalam Nuhu,Qasim Zeeshan,Babak Safaei,Muhammad Atif Shahzad
DOI: https://doi.org/10.1007/s11227-022-04730-x
2022-08-08
Abstract:Industries are going through the fourth industrial revolution (Industry 4.0), where technologies like the Industrial Internet of things, big data analytics, and machine learning (ML) are extensively utilized to improve the productivity and efficiency of manufacturing systems and processes. This work aims to further investigate the applicability and improve the effectiveness of ML prediction models for fault diagnosis in the smart manufacturing process. Hence, we propose several methodologies and ML models for fault diagnosis for smart manufacturing process applications. A case study has been conducted on a real dataset from a semiconductor manufacturing (SECOM) process. However, this dataset contains missing values, noisy features, and class imbalance problem. This imbalance problem makes it so difficult to accurately predict the minority class, due to the majority class size difference. In the literature, efforts have been made to alleviate the class imbalance problem using several synthetic data generation techniques (SDGT) on the UCI machine learning repository SECOM dataset. In this work, to handle the imbalance problem, we employed, compared, and evaluated the feasibility of three SDGT on this dataset. To handle issues related to the missing values and noisy features, we implemented two missing values imputation techniques and feature selection techniques, respectively. We then developed and compared the performance of ten predictive ML models against these proposed methodologies. The results obtained across several evaluation metrics of performance were significant. A comparative analysis shows the feasibility and validate the effectiveness of these SDGT and the proposed methodologies. Some among the proposed methodologies could produce an accuracy in the range of 99.5% to 100%. Furthermore, based on a comparative analysis with similar models from the literature, our proposed models outpaced those proposed in the literature.
computer science, theory & methods,engineering, electrical & electronic, hardware & architecture