Abstract:Big data analysis is becoming a daily task for companies all over the world as well as for Russian companies. With advances in technology and reduced storage costs, companies today can collect and store large amounts of heterogeneous data. The important step of extracting knowledge and value from such data is a challenge that will ultimately be faced by all companies seeking to maintain their competitiveness and place in the market. An approach to the study of metallurgical processes using the analysis of a large array of operational control data is considered. Using the example of steel rolling production, the development of a predictive model based on processing a large array of operational control data is considered. The aim of the work is to develop a predictive model of rolling mill roll wear based on a large array of operational control data containing information about the time of filling and unloading of rolls, rolled assortment, roll material, and time during which the roll is in operation. Preliminary preparation of data for modeling was carried out, which includes the removal of outliers, uncharacteristic and random measurement results (misses), as well as data gaps. Correlation analysis of the data showed that the dimensions and grades of rolled steel sheets, as well as the material from which the rolls are made, have the greatest influence on the wear of rolling mill rolls. Based on the processing of a large array of operational control data, various predictive models of the technological process were designed. The adequacy of the models was assessed by the value of the mean square error (MSE), the coefficient of determination (R2), and the value of the Pearson correlation coefficient (R) between the calculated and experimental values of the mill roll wear. In addition, the adequacy of the models was assessed by the symmetry of the values predicted by the model relative to the straight line Ypredicted = Yactual. Linear models constructed using the least squares method and cross-validation turned out to be inadequate (the coefficient of determination R2 does not exceed 0.3) to the research object. The following regressions were built on the basis of the same operational control database: Linear Regression multivariate, Lasso multivariate, Ridge multivariate, and ElasticNet multivariate. However, these models also turned out to be inadequate to the object of the research. Testing these models for symmetry showed that, in all cases, there is an underestimation of the predicted values. Models using algorithm composition have also been built. The methods of random forest and gradient boosting are considered. Both methods were found to be adequate for the object of the research (for the random forest model, the coefficient of determination is R2 = 0.798; for the gradient boosting model, the coefficient of determination is R2 = 0.847). However, the gradient boosting algorithm is recognized as preferable thanks to its high accuracy compared with the random forest algorithm. Control data for symmetry in reference to the straight line Ypredicted = Yactual showed that, in the case of developing the random forest model, there is a tendency to underestimate the predicted values (the calculated values are located below the straight line). In the case of developing a gradient boosting model, the predicted values are located symmetrically regarding the straight line Ypredicted = Yactual. Therefore, the gradient boosting model is preferred. The predictive model of mill roll wear will allow rational use of rolls in terms of minimizing overall roll wear. Thus, the proposed model will make it possible to redistribute the existing work rolls between the stands in order to reduce the total wear of the rolls.

Prediction and Causal Analysis of Defects in Steel Products: Handling Nonnegative and Highly Overdispersed Count Data

Lognormal and Gamma Mixed Negative Binomial Model for Defects Prediction in Steel Products

Poisson mixture model for defects prediction in steelmaking

Hurdle Modeling for Defect Data with Excess Zeros in Steel Manufacturing Process

Real-Time Forecasting of Subsurface Inclusion Defects for Continuous Casting Slabs: A Data-Driven Comparative Study

Probabilistic Machine Leaning Models for Predicting the Maximum Displacements of Concrete-Filled Steel Tubular Columns Subjected to Lateral Impact Loading

Attention-based Stacked Supervised Poisson Autoencoders for Defects Prediction in Casting-rolling Process

Honey bees performing varroa sensitive hygiene remove the most mite-compromised bees from highly infested patches of brood

Analyzing Risk of Service Failures in Heavy Haul Rail Lines: A Hybrid Approach for Imbalanced Data

Mixup Enhanced Linear Robust Explainable Model for Identifying Key Factors of Surface Defects in Strip Steel Manufacturing

Big Data as a Tool for Building a Predictive Model of Mill Roll Wear

On the Use of Data-Driven Machine Learning for Probabilistic Fatigue Life Prediction of Metallic Materials Based on Mesoscopic Defect Analysis

Uncertainty Quantification of Data-driven Quality Prediction Model For Realizing the Active Sampling Inspection of Mechanical Properties in Steel Production

Gaussian-Poisson Mixture Regression model for defects prediction in steelmaking

A Self-Training-based Approach for Aluminum Alloy Casting Quality Prediction

Stacked Supervised Poisson Autoencoders-Based Soft-Sensor for Defects Prediction in Steelmaking Process

A Comparative Study on Machine Learning Algorithms for Smart Manufacturing: Tool Wear Prediction Using Random Forests

Application of data-driven models to predictive maintenance: Bearing wear prediction at TATA steel

Predicting Steel Column Stability with Uncertain Initial Defects Using Bayesian Deep Learning

Intelligent Prediction of Oxygen Consumption in Steelmaking Based on Random Forest Method