A Machine Learning Approach Classification of Deep Web Sources

Hexiang Xu,Chenghong Zhang,Xiulan Hao,Yunfa Hu
DOI: https://doi.org/10.1109/FSKD.2007.54
2007-01-01
Abstract:The classification of deep Web sources is an important area in large-scale deep Web integration, which is still at an early stage. Many deep web sources are structured by providing structured query interfaces and results. Classifying such structured sources into domains is one of the critical steps toward the integration of heterogeneous Web sources. To date, in terms of the classification, existing works mainly focus on classifying texts or Web documents, and there is little in the deep web. In this paper, we present a deep Web model and machine learning based classifying model. The experimental results show that we can achieve a good performance with a small scale training samples for each domain, and as the number of training samples increases, the performance keeps stabilization.
What problem does this paper attempt to address?