DBSCAN based Automatic de-duplication for software quality inspection data

Chun-Hua Cao,Ya-Na Tang,Hua Zhou,Yu-Li Li,Zbigniew Marszałek,Chun-hua Cao,Ya-na Tang,Yu-li Li,Zbigniew Marszalek
DOI: https://doi.org/10.1109/access.2022.3164192
IF: 3.9
2022-01-01
IEEE Access
Abstract:Software quality inspection will generate too much data, and removing duplicate data can improve the efficiency of software quality inspection. This paper studies the automatic de-duplication method of software quality inspection data based on density-based spatial clustering of applications with noise (DBSCAN) clustering. Intelligent optimization algorithm is used to generate software quality inspection data by initializing individuals, calculating fitness function value, improving individuals and splitting individuals that meet the conditions. Local linear embedding algorithm is selected to extract software quality inspection data features by searching neighborhood points, calculating reconstruction weight and projection vector. The extracted features are used to select DBSCAN multi-density clustering algorithm of regional division, and the automatic de-duplication of software quality inspection data is realized by grid division, data bin dividing and grid merging. The experimental results show that the precision and recall of this method are higher than 99%, and the resource consumption rate is low, which can effectively improve the efficiency of software quality inspection.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?