Prediction and Causal Analysis of Defects in Steel Products: Handling Nonnegative and Highly Overdispersed Count Data

Xinmin Zhang,Manabu Kano,Masahiro Tani,Junichi Mori,Junji Ise,Kohhei Harada
DOI: https://doi.org/10.1016/j.conengprac.2019.104258
IF: 4.057
2020-01-01
Control Engineering Practice
Abstract:In the steel industry, defects may occur during the manufacturing process. Thus, it is important to predict the occurrence of defects online in steel products and identify the causal variable that may lead to defects. However, the unique characteristics of the observed defect count data, such as nonnegative integers and high overdispersion, have posed some difficulties to the traditional probability models. To deal with this issue, the present work employs random forests to model and analyze the observed defect count data. Random forests are a nonlinear ensemble learning technique, which constructs several regression trees during the training phase and then predicts the output by averaging the predictions of each tree. Unlike the traditional probability models which are based on the specific distribution assumption, random forests are a non-parametric or distribution-free model. Furthermore, random forests can ensure the nonnegativity of the prediction, and thus it is suitable for defect count data modeling. In addition, partial dependence analysis in conjunction with the variable importance measure was used to identify the causal variable. The application results on the real steelmaking process have demonstrated that random forests outperform the PLS, SVR, Poisson, and NB methods in prediction accuracy. And the most influential variables identified by random forests are in line with operator experience.
What problem does this paper attempt to address?