Machine Learning, Linear and Bayesian Models for Logistic Regression in Failure Detection Problems

B. Pavlyshenko
DOI: https://doi.org/10.48550/arXiv.1612.05740
2016-12-17
Abstract:In this work, we study the use of logistic regression in manufacturing failures detection. As a data set for the analysis, we used the data from Kaggle competition Bosch Production Line Performance. We considered the use of machine learning, linear and Bayesian models. For machine learning approach, we analyzed XGBoost tree based classifier to obtain high scored classification. Using the generalized linear model for logistic regression makes it possible to analyze the influence of the factors under study. The Bayesian approach for logistic regression gives the statistical distribution for the parameters of the model. It can be useful in the probabilistic analysis, e.g. risk assessment.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively detect internal faults during the manufacturing process, especially by using the logistic regression method for prediction. Specifically, the paper focuses on: 1. **Fault Detection in the Manufacturing Process**: During the manufacturing process, parts go through multiple procedures and a large amount of measurement and test data is recorded. These data can be used to improve the manufacturing process, but their complexity and the large amount of data make it difficult for current methods to handle effectively. In particular, in the Kaggle competition "Bosch Production Line Performance" provided by Bosch, the goal is to predict which parts will fail in quality control (i.e., internal faults). 2. **Highly Imbalanced Data Set**: The characteristic of the competition data set is that the classification categories are highly imbalanced, that is, the positive class (fault) samples are far fewer than the negative class (non - fault) samples. This imbalance poses a challenge to traditional classification algorithms. 3. **Application and Comparison of Multiple Models**: In order to address the above problems, the paper explores different modeling methods, including: - **Machine Learning Methods**: Use gradient - boosted tree classifiers such as XGBoost to obtain high - precision classification results. - **Generalized Linear Model (GLM)**: Analyze the influence of various factors on fault detection through logistic regression. - **Bayesian Model**: Obtain the probability distribution of model parameters through Bayesian inference to conduct risk assessment. 4. **Combination of Multi - level Models**: The paper also proposes a multi - level model that combines machine learning models and linear or Bayesian models to improve the accuracy of prediction. For example, use XGBoost models with different parameter settings for prediction at the first level, and then use linear or Bayesian regression to fuse these prediction results at the second level. ### Summary The core problem of the paper is to use a variety of statistical and machine learning methods, especially logistic regression, to solve the problem of predicting internal faults in the manufacturing process, with particular attention to the high imbalance of data and the effective combination of different models.