Investigation of Rounding Algorithms Combining Data Distribution Characteristics

Mengyuan Zhu,Qian Zhang,Yunwei Zhang,Tao Shen,Baochang Zhang
DOI: https://doi.org/10.1109/iciea54703.2022.10006192
2022-01-01
Abstract:The default ratings for the scenarios like user ratings or recommendation systems are usually expressed as integers. However, the prediction results of regression models based on machine learning or deep learning are usually floating-point decimals. The bias between integer ratings and floating-point decimals expanded the mean absolute error (MAE). In this paper, we first compared the results among three conventional rounding methods, i.e., rounding up, rounding down, and rounding off. The results show that conventional rounding method does not consider the information of original data distribution. Therefore, we established two novel rounding algorithms combining the real-world data distribution, which aim to reduce the MAE of the LightGBM regression outputs. First, an adjacent rounding algorithm combining the data distribution of adjacent labels was proposed. By this means, the predicted value should be the one with the higher distribution frequency between the two integer labels. Then, we extended the rounding algorithm to the top-n label values with higher distribution frequency than others. Moreover, a global optimal rounding algorithm is proposed by taking the distance information between each predicted value and the first n labels as another influencing factor. The proposed method has demonstrated potential applications in recommender systems, user ratings and other scenarios.
What problem does this paper attempt to address?