A Context-Aware Entity Ranking Method for Web-Based Data Imputation

Zhao-Qiang CHEN,Jia-Jun LI,Chuan JIANG,Hai-Long LIU,Qun CHEN,Zhan-Huai LI
DOI: https://doi.org/10.11897/SP.J.1016.2015.01755
2015-01-01
Chinese Journal of Computers
Abstract:In Big Data era,data missing is very common in real life and it puzzles people since it makes decisions based on data unreliable.Most existing data imputation methods employ local database to repair missing numerical values,while these methods do not fit the case that repair missing numerical and non-numerical values using data from web.Web-based data imputation usually contains four steps,formulating queries,searching,entity extraction and entity ranking. During these steps,entity ranking plays a key role and makes the final decision on repairing. Recently works on web-based data imputation are major in two aspects,one makes efforts to improve query formulating and entity extracting,then uses frequency to rank,the other one makes efforts to analyze features that belong to target entities,then calculates and combines features’values to rank.Frequency-based or weighting-based entity ranking method considers factors related to entity itself while ignoring the influence between entities.In this paper,we propose a graph-based entity ranking method called CER(Context-aware Entity Ranking),it can take advantage of the context of candidate entities and make a comprehensive ranking utilizing the graph model.Experiments based on real-world data collections demonstrate that CER performs a more effective data imputation utilizing massive web data than the existing entity ranking methods such as frequency-based and weighting-based.
What problem does this paper attempt to address?