Machine Learning based CVD Virtual Metrology in Mass Produced Semiconductor Process

Yunsong Xie,Ryan Stearrett
DOI: https://doi.org/10.48550/arXiv.2107.05071
2021-07-28
Abstract:A cross-benchmark has been done on three critical aspects, data imputing, feature selection and regression algorithms, for machine learning based chemical vapor deposition (CVD) virtual metrology (VM). The result reveals that linear feature selection regression algorithm would extensively under-fit the VM data. Data imputing is also necessary to achieve a higher prediction accuracy as the data availability is only ~70% when optimal accuracy is obtained. This work suggests a nonlinear feature selection and regression algorithm combined with nearest data imputing algorithm would provide a prediction accuracy as high as 0.7. This would lead to 70% reduced CVD processing variation, which is believed to will lead to reduced frequency of physical metrology as well as more reliable mass-produced wafer with improved quality.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to improve the prediction accuracy of Chemical Vapor Deposition (CVD) Virtual Metrology (VM) through machine - learning methods in the semiconductor manufacturing process of large - scale production. Specifically, the research aims to find the optimal combination scheme by evaluating three key aspects: data imputation, feature selection, and regression algorithms, in order to achieve higher prediction accuracy. This not only helps to reduce the frequency of physical metrology, but also improves the quality control of wafers in mass production, thereby reducing production costs and improving product reliability. ### Main problems 1. **Improving prediction accuracy**: Existing linear feature selection and regression algorithms are prone to under - fitting problems when dealing with CVD virtual metrology data, resulting in insufficient prediction accuracy. 2. **Handling missing data**: Since data sets in actual production usually contain a large number of missing values, how to effectively impute these missing data is the key to improving prediction performance. 3. **Optimizing feature selection**: How to select the most representative features from a large amount of sensor data to avoid model over - fitting or under - fitting. ### Research objectives - By combining non - linear feature selection and regression algorithms with the nearest - neighbor data imputation algorithm, achieve a test accuracy rate of up to 0.7, that is, be able to explain 70% of the process variations. - Reduce the variations in the CVD processing process, thereby reducing the frequency of physical metrology, and ultimately reducing production costs and improving quality control. ### Methodology 1. **Data overview**: Normalize all features to be between 0 and 1, and divide the data set into training set, development set, and test set in the ratio of 70%, 15%, and 15%. 2. **Missing - value imputation**: Use five common data imputation algorithms, namely ARIMA, KNN, nearest - neighbor, random imputation, and random forest, for comparison. 3. **Feature selection and regression**: Evaluate six different regression algorithms (linear least - squares method, partial least - squares method, Bayesian ridge regression, support vector regression, gradient boosting, and neural network), and combine them with corresponding feature selection algorithms. ### Results - Non - linear algorithms (such as gradient boosting and neural network) perform significantly better than linear algorithms on the test set because linear algorithms are prone to under - fitting. - The nearest - neighbor imputation method performs the best among all imputation methods, while the random imputation method performs the worst. - When the number of input features is approximately 100, the test accuracy rate of the model reaches the highest, and at this time, the imputed data plays a key role in improving the test accuracy rate. ### Conclusion This research systematically evaluates the applications of multiple data imputation, feature selection, and regression algorithms in CVD virtual metrology for the first time, and proves the effectiveness of the combination of non - linear algorithms and the nearest - neighbor imputation method. This combination can achieve a prediction accuracy rate of 0.7 on the test set, which means that 70% of random process variations can be reduced, thus laying the foundation for the future development of virtual metrology.