Extracting Linked Data from HTML Tables.

Ahmed Ktob,Zhoujun Li,Djelloul Bouchiha
DOI: https://doi.org/10.1109/cic.2017.00018
2017-01-01
Abstract:The web plays a crucial role in our daily life. Its openness allows users to access data around the clock. Recently, data has become more exploitable by machines due to the newly introduced mechanism of linked data, which improves the quality of published data on the web dramatically. Therefore, we have attempted to benefit from the investment, regarding data, which already exist on the web, particularly web applications, to generate linked data. To achieve this, we suggested a set of transformation rules to extract data from HTML tables then convert them into RDF (Resource Description Framework) triples. Our hypothesis is based on a direct conversion of relational data into RDF triples proposed by the W3C Consortium. The suggested extraction process of RDF triples is automatic; however, it remains manual when it comes to primary and foreign keys detection. Simultaneously, we have developed a tool, called HTML2RDF, which accomplishes the extraction process. Results obtained by HTML2RDF were promising. However, their quality remains dependent on the proper determination of primary and foreign keys.
What problem does this paper attempt to address?