Abstract:A cross-benchmark has been done on three critical aspects, data imputing, feature selection and regression algorithms, for machine learning based chemical vapor deposition (CVD) virtual metrology (VM). The result reveals that linear feature selection regression algorithm would extensively under-fit the VM data. Data imputing is also necessary to achieve a higher prediction accuracy as the data availability is only ~70% when optimal accuracy is obtained. This work suggests a nonlinear feature selection and regression algorithm combined with nearest data imputing algorithm would provide a prediction accuracy as high as 0.7. This would lead to 70% reduced CVD processing variation, which is believed to will lead to reduced frequency of physical metrology as well as more reliable mass-produced wafer with improved quality.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to improve the prediction accuracy of Chemical Vapor Deposition (CVD) Virtual Metrology (VM) through machine - learning methods in the semiconductor manufacturing process of large - scale production. Specifically, the research aims to find the optimal combination scheme by evaluating three key aspects: data imputation, feature selection, and regression algorithms, in order to achieve higher prediction accuracy. This not only helps to reduce the frequency of physical metrology, but also improves the quality control of wafers in mass production, thereby reducing production costs and improving product reliability. ### Main problems 1. **Improving prediction accuracy**: Existing linear feature selection and regression algorithms are prone to under - fitting problems when dealing with CVD virtual metrology data, resulting in insufficient prediction accuracy. 2. **Handling missing data**: Since data sets in actual production usually contain a large number of missing values, how to effectively impute these missing data is the key to improving prediction performance. 3. **Optimizing feature selection**: How to select the most representative features from a large amount of sensor data to avoid model over - fitting or under - fitting. ### Research objectives - By combining non - linear feature selection and regression algorithms with the nearest - neighbor data imputation algorithm, achieve a test accuracy rate of up to 0.7, that is, be able to explain 70% of the process variations. - Reduce the variations in the CVD processing process, thereby reducing the frequency of physical metrology, and ultimately reducing production costs and improving quality control. ### Methodology 1. **Data overview**: Normalize all features to be between 0 and 1, and divide the data set into training set, development set, and test set in the ratio of 70%, 15%, and 15%. 2. **Missing - value imputation**: Use five common data imputation algorithms, namely ARIMA, KNN, nearest - neighbor, random imputation, and random forest, for comparison. 3. **Feature selection and regression**: Evaluate six different regression algorithms (linear least - squares method, partial least - squares method, Bayesian ridge regression, support vector regression, gradient boosting, and neural network), and combine them with corresponding feature selection algorithms. ### Results - Non - linear algorithms (such as gradient boosting and neural network) perform significantly better than linear algorithms on the test set because linear algorithms are prone to under - fitting. - The nearest - neighbor imputation method performs the best among all imputation methods, while the random imputation method performs the worst. - When the number of input features is approximately 100, the test accuracy rate of the model reaches the highest, and at this time, the imputed data plays a key role in improving the test accuracy rate. ### Conclusion This research systematically evaluates the applications of multiple data imputation, feature selection, and regression algorithms in CVD virtual metrology for the first time, and proves the effectiveness of the combination of non - linear algorithms and the nearest - neighbor imputation method. This combination can achieve a prediction accuracy rate of 0.7 on the test set, which means that 70% of random process variations can be reduced, thus laying the foundation for the future development of virtual metrology.

Machine Learning based CVD Virtual Metrology in Mass Produced Semiconductor Process

Development of Convolutional Neural Network Based Gaussian Process Regression to Construct a Novel Probabilistic Virtual Metrology in Multi-Stage Semiconductor Processes

Just-In-Time Modeling with Variable Shrinkage Based on Gaussian Processes for Semiconductor Manufacturing

Convolutional Neural Networks for Multi-Stage Semiconductor Processes

Recurrent feature-incorporated convolutional neural network for virtual metrology of the chemical mechanical planarization process

Application of Gaussian Processes with Variable Shrinkage Method and Just-in-time Modeling in the Semiconductor Industry

Virtual Metrology for Semiconductor Chemical Mechanical Planarization Process Using Wide & Deep Learning.

Spatial Batch Optimal Design Based on Self-Learning Gaussian Process Models for LPCVD Processes

Phase Partition Based Virtual Metrology for Material Removal Rate Prediction in Chemical Mechanical Planarization Process

Introducing machine learning-based application for writer main pole CD metrology by dual beam FIB/SEM

Virtual metrology of semiconductor PVD process based on combination of tree-based ensemble model

Virtual metrology in semiconductor manufacturing: Current status and future prospects

Deep Learning Regression of VLSI Plasma Etch Metrology

Virtual metrology modeling of reactive ion etching based on statistics-based and dynamics-inspired spectral features

Reference-based Virtual Metrology method with uncertainty evaluation for Material Removal Rate prediction based on Gaussian Process Regression

Convolutional Neural Networks for Automatic Virtual Metrology

Quantifying the CVD-grown Two-Dimensional Materials Via Image Clustering.

Statistical Feature Extraction and Hybrid Feature Selection for Material Removal Rate Prediction in Chemical Mechanical Planarization Process

Applying Machine Learning Models on Metrology Data for Predicting Device Electrical Performance

Machine Learning Applied to Electron Beam Lithography to Accelerate Process Optimization of a Contact Hole Layer

Machine learning for rapid inference of critical dimensions in optical metrology of nanopatterned surfaces