An Optimal House Price Prediction Algorithm: XGBoost

Hemlata Sharma, Hitesh Harsora, Bayode Ogunleye
DOI: https://doi.org/10.3390/analytics3010003
2024-02-07
Abstract:An accurate prediction of house prices is a fundamental requirement for various sectors including real estate and mortgage lending. It is widely recognized that a property value is not solely determined by its physical attributes but is significantly influenced by its surrounding neighbourhood. Meeting the diverse housing needs of individuals while balancing budget constraints is a primary concern for real estate developers. To this end, we addressed the house price prediction problem as a regression task and thus employed various machine learning techniques capable of expressing the significance of independent variables. We made use of the housing dataset of Ames City in Iowa, USA to compare support vector regressor, random forest regressor, XGBoost, multilayer perceptron and multiple linear regression algorithms for house price prediction. Afterwards, we identified the key factors that influence housing costs. Our results show that XGBoost is the best performing model for house price prediction.
Machine Learning,Methodology,Artificial Intelligence,Applications
What problem does this paper attempt to address?
This paper aims to address the problem of housing price prediction, which is a crucial task for the real estate and mortgage industries as the real estate sector contributes significantly to the global economy. The paper mentions that the value of a house is not only determined by its physical characteristics but also influenced by its surrounding environment. In order to assist real estate developers in balancing budgets while meeting different housing demands, the researchers consider housing price prediction as a regression problem and apply various machine learning (ML) techniques such as XGBoost, support vector regression, random forest regression, multi-layer perceptron, and multiple linear regression to compare and analyze housing data in Ames, Iowa, United States. The research results demonstrate that XGBoost performs the best in housing price prediction. The paper also emphasizes the identification of key factors that affect house prices, which helps stakeholders make more accurate property valuations and wiser decisions. Although previous studies have applied various ML algorithms, there has been less focus on the factors that influence house prices, and most studies have not optimized the models or extensively discussed feature importance. Therefore, this paper provides insights into how to construct an optimal housing price prediction model through comprehensive comparisons and hyperparameter tuning. The structure of the paper consists of four main parts: literature review, methodology and evaluation process, results presentation, and conclusion and future research directions. By analyzing existing literature, the paper points out that although various algorithms have been attempted in previous studies, there is a lack of in-depth exploration on model optimization and feature importance. The paper utilizes the Ames housing dataset from Kaggle and compares different regression models, ultimately recommending the use of XGBoost due to its interpretability, simplicity, and high accuracy.