Research on telecom insolvency mining oriented data quality assessing strategy

WANG Xiaohua,SU Hongye,QU Yu,CHU Jian
DOI: https://doi.org/10.3778/j.issn.1002-8331.2011.12.062
2011-01-01
Abstract:Aiming at telecom insolvency mining,combining with the imbalance nature of telecom insolvency data,the research priority is set upon the impact on classification result caused by missing values and outliers,and thus a Data Quality Assessment System for Telecom Insolvency Mining(TIM-DQAS) is presented.In the missing evaluation sub-system,a classdistribution-based attribute weighting algorithm is presented to measure the missing costs of input attributes.In the outlier evaluation sub-system,the impact on classification result caused by outliers in imbalance data is analyzed,and the outlier degree is proposed to measure the impact caused by outliers.Based on a series of contrast experiments on telecom personal handphone data of a city,a reference assessing result is provided,and the effectiveness of the assessing strategy is verified.
What problem does this paper attempt to address?