Interactive Mining of Schema for Semistructured Data.

YB Liu,YC Feng
DOI: https://doi.org/10.1117/12.460250
2002-01-01
Abstract:Semistructured data such as HTML, SGML and XML documents are specified in lack of any fixed and rigid schema, but typically some implicit structures appear in the data. The crucial problem of mining of schema is to discover the similarly hidden structures of the semistructured data. The huge of amounts of on-line applications make it important to mine schemas for semistructured data. Notice that the user may have to dynamically time the minimum support of schema, in the course of mining, since the minimum support always describes the user's special interests, we present the problem of interactive mining of schema for semistructured data in this paper. In the course of interactive mining, as the old minimum support of schema is tuned by the user, one possible way of discovering the interesting schemas is to re-run the mining algorithm of schema on the new minimum from scratch. However this approach is not efficient for it does not utilize the already mined results. Hence an incremental mining algorithm is presented. In addition, an improved algorithm for finding the maximal schema tree sets is also given. The experimental results show that the incremental algorithm is more efficient than the non-incrementally A-priori-like algorithm.
What problem does this paper attempt to address?