Research on Deep Web Classification Based on Domain Feature Text

WU Chun-ming,XIE De-ti
DOI: https://doi.org/10.3969/j.issn.1002-137x.2012.04.040
2012-01-01
Computer Science
Abstract:Automatic Deep Web classification is the basis of building Deep Web data intergration system.An approach was proposed to classify the Deep Web based on domain feature text.Using the ontology knowledge,the concepts which express the same semantics were firstly extracted from different texts.Then the definition of domain correlation was given as the quantitative criteria for feature text selection,in order to avoid the subjectivity and uncertainty of manual selection.In the process of the interface vector space model construction,an improved weighting method named W-TFIDF was proposed to evaluate the different roles of feature text.At last,a KNN algorithm was used to classify these interface vectors.Comparative experiments indicate that the feature text selected by our method is accurate and effective,and the new weighting method can improve the classification precision significantly and shows good stability in KNN classification.
What problem does this paper attempt to address?