Poverty prediction using E-commerce dataset and filter-based feature selection approach

Dedy Rahman Wijaya,Raden Ilham Fadhilah Ibadurrohman,Elis Hernawati,Wawa Wikusna
DOI: https://doi.org/10.1038/s41598-024-52752-7
IF: 4.6
2024-02-08
Scientific Reports
Abstract:Poverty is a problem that occurs in many countries, notably in Indonesia. The common methods used to obtain poverty information are surveys and censuses. However, this process takes a long time and uses a lot of human resources. On the other hand, governments and policymakers need a faster approach to know social-economic conditions for area development plans. Hence, in this paper, we develop e-commerce data and machine learning algorithms as a proxy for poverty levels that can provide faster information than surveys or censuses. The e-commerce dataset is used and this high-dimensional data becomes a challenge. Hence, feature selection algorithms are employed to determine the best features before building a machine learning model. Furthermore, three machine learning algorithms such as support vector regression, linear regression, and k-nearest neighbor are compared to predict the poverty rate. Hence, the contribution of this paper is to propose the combination of statistical-based feature selection and machine learning algorithms to predict the poverty rate based on e-commerce data. According to the experimental results, the combination of f-score feature selection and support vector regression surpasses other methods. It shows that e-commerce data and machine learning algorithms can be potentially used as a proxy for predicting poverty.
multidisciplinary sciences
What problem does this paper attempt to address?