A Probabilistic Approach for Missing Data Imputation

Muhammed Nazmul Arefin,Abdul Kadar Muhammad Masum
DOI: https://doi.org/10.1155/2024/4737963
IF: 2.3
2024-01-20
Complexity
Abstract:In the context of data analysis, missing data imputation is a vital issue due to the typically large scale and complexity of the datasets. It often results in a higher incidence of missing data. So, addressing missing data through the imputation technique is essential to ensure the integrity and completeness of the data. It will ultimately improve the accuracy and validity of the data analysis. The prime objective of this study is to propose an imputation model. This paper presents a method for imputing missing employee data through a combination of features and probability calculations. The study utilized employee datasets that were collected from the Kaggle along with primary data collected from RMG factories located in Chittagong. The suggested algorithm demonstrated a notable level of accuracy on the datasets, and the average accuracy for each identified technique was also quite satisfactory. This study contributes to the existing body of research on missing data imputation in big data analysis and offers practical implications for handling missing data in different datasets. Usage of this technique will enhance the accuracy of data analysis and decision-making in organizations.
mathematics, interdisciplinary applications,multidisciplinary sciences
What problem does this paper attempt to address?
The paper attempts to address the issue of missing data in data analysis. Specifically: - **Missing Data Handling**: In large-scale and complex datasets, missing data is a common problem. By using imputation techniques to handle missing data, the integrity and accuracy of the data can be ensured, thereby improving the effectiveness of data analysis. - **Handling Missing Values in Employee Datasets**: The paper proposes a method that combines feature and probability calculations to fill in missing values in employee datasets and verifies the accuracy and effectiveness of this method on actual datasets. - **Improvement of Existing Methods**: Compared to existing single imputation, multiple imputation, and machine learning-based methods, the proposed method in the paper demonstrates better results. In summary, the main goal of the paper is to propose a new imputation model to improve the accuracy and reliability of handling missing data in different datasets.