How much is my car worth? A methodology for predicting used cars prices using Random Forest

Nabarun Pal,Priya Arora,Dhanasekar Sundararaman,Puneet Kohli,Sai Sumanth Palakurthy
DOI: https://doi.org/10.48550/arXiv.1711.06970
2017-11-19
Abstract:Cars are being sold more than ever. Developing countries adopt the lease culture instead of buying a new car due to affordability. Therefore, the rise of used cars sales is exponentially increasing. Car sellers sometimes take advantage of this scenario by listing unrealistic prices owing to the demand. Therefore, arises a need for a model that can assign a price for a vehicle by evaluating its features taking the prices of other cars into consideration. In this paper, we use supervised learning method namely Random Forest to predict the prices of used cars. The model has been chosen after careful exploratory data analysis to determine the impact of each feature on price. A Random Forest with 500 Decision Trees were created to train the data. From experimental results, the training accuracy was found out to be 95.82%, and the testing accuracy was 83.63%. The the model can predict the price of cars accurately by choosing the most correlated features.
Computers and Society,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: How to accurately predict the price of second - hand cars. With the global growth of second - hand car sales, consumers may be exploited due to unreasonable pricing. Therefore, it is particularly important to establish a model that can evaluate the value of vehicles according to their characteristics. This paper proposes a method of using the Random Forest machine - learning method to predict the price of second - hand cars and verifies the effectiveness of the model through experiments. Specifically, by collecting and pre - processing the data set, the author selects the features that have the greatest impact on the price and constructs a Random Forest model containing 500 decision trees, finally achieving an accuracy rate of 95.82% on the training set and 83.63% on the test set.