Performance evaluation of different regression models: application in a breast cancer patient data

Mona Mahmoud Abo El Nasr,Alaa A. Abdelmegaly,Doaa A. Abdo
DOI: https://doi.org/10.1038/s41598-024-62627-6
IF: 4.6
2024-06-08
Scientific Reports
Abstract:This paper provides a comprehensive analysis of linear regression models, focusing on addressing multicollinearity challenges in breast cancer patient data. Linear regression methodologies, including GAM, Beta, GAM Beta, Ridge, and Beta Ridge, are compared using two statistical criteria. The study, conducted with R software, showcases the Beta regression model's exceptional performance, achieving a BIC of − 5520.416. Furthermore, the Ridge regression model demonstrates remarkable results with the best AIC at − 8002.647. The findings underscore the practical application of these models in real-world scenarios and emphasize the Beta regression model's superior ability to handle multicollinearity challenges. The preference for AIC over BIC in Generalized Additive Models (GAMs) is rooted in the AIC's calculation framework, highlighting its effectiveness in capturing the complexity and flexibility inherent in GAMs.
multidisciplinary sciences
What problem does this paper attempt to address?
### The Problem the Paper Attempts to Solve This paper aims to address the challenges posed by multicollinearity in breast cancer patient data and to evaluate the performance of different regression models in handling multicollinearity by comparing them. Specifically, the paper explores the following regression models: 1. **Generalized Additive Model (GAM)** 2. **Beta Regression Model** 3. **GAM Beta Regression Model** 4. **Ridge Regression Model** 5. **Beta Ridge Regression Model** The paper evaluates the performance of these models using two statistical criteria—AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion). The study finds that the Beta Regression Model performs exceptionally well in handling multicollinearity, achieving the lowest BIC value (-5520.416), while the Ridge Regression Model performs best in terms of AIC (-8002.647). Additionally, the paper emphasizes the effectiveness of AIC in the Generalized Additive Model. By analyzing various variables of breast cancer patients, the paper reveals the strengths and weaknesses of different models in practical applications and provides guidance for selecting the appropriate regression method. Specifically, the paper focuses on how to choose the optimal regression model in datasets with multicollinearity to improve prediction accuracy and model stability.