From Text to XML by Structural Information Extraction

Yong Piao,Tianyu Wang,He Jiang
DOI: https://doi.org/10.1109/compcomm.2015.7387613
2015-01-01
Abstract:Facing tremendous volume of semi-structured XML and non-structured free text, network information retrieval is one of the most research hotspots in dealing with these data more efficiently, precisely and uniformly. Many traditional IR methods ignore text semantics and their labeling result has usually only one level, lacking of context expression as well, therefore structure extraction from free text and its conversion to XML format are studied, with a CRF based algorithm SIECRF provided. Experiment results are analyzed, showing its efficiency to extracting text structure and has a good application future.
What problem does this paper attempt to address?