Abstract:In this work, we study the use of logistic regression in manufacturing failures detection. As a data set for the analysis, we used the data from Kaggle competition Bosch Production Line Performance. We considered the use of machine learning, linear and Bayesian models. For machine learning approach, we analyzed XGBoost tree based classifier to obtain high scored classification. Using the generalized linear model for logistic regression makes it possible to analyze the influence of the factors under study. The Bayesian approach for logistic regression gives the statistical distribution for the parameters of the model. It can be useful in the probabilistic analysis, e.g. risk assessment.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to effectively detect internal faults during the manufacturing process, especially by using the logistic regression method for prediction. Specifically, the paper focuses on: 1. **Fault Detection in the Manufacturing Process**: During the manufacturing process, parts go through multiple procedures and a large amount of measurement and test data is recorded. These data can be used to improve the manufacturing process, but their complexity and the large amount of data make it difficult for current methods to handle effectively. In particular, in the Kaggle competition "Bosch Production Line Performance" provided by Bosch, the goal is to predict which parts will fail in quality control (i.e., internal faults). 2. **Highly Imbalanced Data Set**: The characteristic of the competition data set is that the classification categories are highly imbalanced, that is, the positive class (fault) samples are far fewer than the negative class (non - fault) samples. This imbalance poses a challenge to traditional classification algorithms. 3. **Application and Comparison of Multiple Models**: In order to address the above problems, the paper explores different modeling methods, including: - **Machine Learning Methods**: Use gradient - boosted tree classifiers such as XGBoost to obtain high - precision classification results. - **Generalized Linear Model (GLM)**: Analyze the influence of various factors on fault detection through logistic regression. - **Bayesian Model**: Obtain the probability distribution of model parameters through Bayesian inference to conduct risk assessment. 4. **Combination of Multi - level Models**: The paper also proposes a multi - level model that combines machine learning models and linear or Bayesian models to improve the accuracy of prediction. For example, use XGBoost models with different parameter settings for prediction at the first level, and then use linear or Bayesian regression to fuse these prediction results at the second level. ### Summary The core problem of the paper is to use a variety of statistical and machine learning methods, especially logistic regression, to solve the problem of predicting internal faults in the manufacturing process, with particular attention to the high imbalance of data and the effective combination of different models.

Machine Learning, Linear and Bayesian Models for Logistic Regression in Failure Detection Problems

A Novel Bayesian Robust Model and Its Application for Fault Detection and Automatic Supervision of Nonlinear Process

Predicting machine failures from multivariate time series: an industrial case study

Predicting Future Machine Failure from Machine State Using Logistic Regression

Analyzing Risk of Service Failures in Heavy Haul Rail Lines: A Hybrid Approach for Imbalanced Data

Modeling failures in smart grids by a bilinear logistic regression approach

Using Big Data to Enhance the Bosch Production Line Performance: A Kaggle Challenge

Adoption of machine learning technology for failure prediction in industrial maintenance: A systematic review

Using machine learning and deep learning algorithms for downtime minimization in manufacturing systems: an early failure detection diagnostic service

Machine Learning in High Volume Media Manufacturing

Adaptable and Explainable Predictive Maintenance: Semi-Supervised Deep Learning for Anomaly Detection and Diagnosis in Press Machine Data

A two-level machine learning framework for predictive maintenance: comparison of learning formulations

Growth Kinetics and Competition Between Methanosarcina and Methanosaeta in Mesophilic Anaerobic Digestion

Strategies for overcoming data scarcity, imbalance, and feature selection challenges in machine learning models for predictive maintenance

Failure prediction in production line based on federated learning: an empirical study

Machine Learning Application Using Cost-Effective Components for Predictive Maintenance in Industry: A Tube Filling Machine Case Study

Using machine learning prediction models for quality control: a case study from the automotive industry

Data models for service failure prediction in supply-chain networks

Model-Driven Bayesian Network Learning for Factory-Level Fault Diagnostics and Resilience

Interpretable Machine Learning Models for Failure Cause Prediction in Imbalanced Oil Pipeline Data