Improving Machine Learning-Based Code Smell Detection Via Hyper-Parameter Optimization.

Lei Shen,Wangshu Liu,Xiang Chen,Qing Gu,Xuejun Liu
DOI: https://doi.org/10.1109/apsec51365.2020.00036
2020-01-01
Abstract:Unlike code errors, the presence of code smell often does not affect the behavior of the software system, but it will cause quality problems in terms of readability, understandability, and efficiency. To improve the software quality and reduce the maintenance costs, the developers need to detect code smells rapidly and make corresponding code refactoring. In code smell detection, recently, machine learning-based methods become more prevalent and can overcome the shortcomings of the heuristic-based methods, which mainly rely on manually designed rules. However, to our best knowledge, there is little research to analyze whether using hyper-parameter optimization can improve the performance of machine learning-based methods. In this study, we mainly focus on two classical code smells (i.e., Data Class and Feature Envy). First, we consider four optimizers for hyper-parameter optimization, and six commonly used classifiers for machine-learning-based methods. Second, we use AUC as the performance measure to evaluate the performance of constructed models. Based on final empirical results, we find that (1) Using hyper-parameter optimization can significantly improve the performance of code smell detection. (2) Differential evolution (DE) optimizer can achieve better performance than the other three optimizers when using the random forest classifier. (3) We can further improve the performance of code smell detection when performing parameter optimization on the DE optimizer.
What problem does this paper attempt to address?