Metadata Extracting for HTML Document Based on Rules

狄涤,周竞扬,潘金贵
DOI: https://doi.org/10.3969/j.issn.1000-3428.2004.09.034
2004-01-01
Abstract:This paper proposes a metadata extracting method for HTML document based on rules. After introducing syntax and semantics of the rules, design of rule library is discussed. Based on the method aforementioned, a system named MEDES(metadata extracting system) is developed, which can perform automatic metadata extraction from HTML documents. The experiment results are evaluated, and the future work are discussed in the end.
What problem does this paper attempt to address?