Integrating Multi-Source Web Records into Relational Database

Huang Jianbin,Ji Hongbing,Sun Heli
DOI: https://doi.org/10.1007/bf02829232
2006-01-01
Wuhan University Journal of Natural Sciences
Abstract:How to integrate heterogeneous semi-structured Web records into relational database is an important and challengeable research topic. An improved model of conditional random fields was presented to combine the learning of labeled samples and unlabeled database records in order to reduce the dependence on tediously hand-labeled training data. The proposed model was used to solve the problem of schema matching between data source schema and database schema. Experimental results using a large number of Web pages from diverse domains show the novel approach's effectiveness.
What problem does this paper attempt to address?