DEEP WEB DATA SOURCES CLASSIFICATION BASED ON TEXT VSM OF QUERY INTERFACE

Shi Long,Qiang Baohua,Wu Chunming
DOI: https://doi.org/10.3969/j.issn.1000-386x.2013.08.015
2013-01-01
Abstract:With the rapid development of Internet technology,a large number of Web databases have mushroomed and the number remains in a fast-growing trend.In order to effectively organise and utilise the information which hides deeply in Web databases,it is necessary to classify and integrate them according to domains.Since the query interface of Webpage is the unique channel to access the Web database,the classification of Deep Web data source can be realised by classifying the query interfaces.In this paper,a classification method based on text VSM of query interface is proposed.The basic idea is to build a vector space model(VSM) by using query interface text information firstly.Then the typical data mining classification algorithm is employed to train one or more classifiers,thus to classify the domains the query interfaces belonging to is implemented.Experimental result shows that the approach proposed in the paper has excellent classification performance.
What problem does this paper attempt to address?