Abstract:The discovery of association rules from large amounts of structured or semi-structured data is an important data mining problem [Agrawal et al. 1993, Agrawal and Srikant 1994, Miyahara et al. 2001, Termier et al. 2002, Braga et al. 2002, Cong et al. 2002, Braga et al. 2003, Xiao et al. 2003, Maruyama and Uehara 2000, Wang and Liu 2000]. It has crucial applications in decision support and marketing strategy. The most prototypical application of association rules is market basket analysis using transaction databases from supermarkets. These databases contain sales transaction records, each of which details items bought by a customer in the transaction. Mining association rules is the process of discovering knowledge such as “80% of customers who bought diapers also bought beer, and 35% of customers bought both diapers and beer”, which can be expressed as “diaper ? beer” (35%, 80%), where 80% is the confidence level of the rule, and 35% is the support level of the rule indicating how frequently the customers bought both diapers and beer. In general, an association rule takes the form X ? Y (s, c), where X and Y are sets of items, and s and c are support and confidence, respectively. In the XML Era, mining association rules is confronted with more challenges than in the traditional well-structured world due to the inherent flexibilities of XML in both structure and semantics [Feng and Dillon 2005]. First, XML data has a more complex hierarchical structure than a database record. Second, elements in XML data have contextual positions, which thus carry the order notion. Third, XML data appears to be much bigger than traditional data. To address these challenges, the classic association rule mining framework originating with transactional databases needs to be re-examined.

Interactive Mining of Schema for Semistructured Data.

Incremental Mining of the Schema of Semistructured Data

Web mining of relations from XML and construct database schema

Bottom-up Discovery of Frequent Rooted Unordered Subtrees

A Rule-Based Information Extraction System for Human-Readable Semi-Structured Scientific Documents

Extracting Local Schema from Semistructured Data Based on Graph-Oriented Semantic Model

An efficient approach for interactive mining of frequent itemsets

Semantic Graph Mining for e-Science

Tree model guided candidate generation for mining frequent subtrees from XML documents

Schema Extraction on Semi-structured Data

Efficient subject-oriented evaluating and mining methods for data with schema uncertainty

A New Conceptual Graph Generated Algorithm for Semi-structured Databases

Mining schema matching between heterogeneous databases

XML-Enabled Association Analysis

Incremental and Interactive Data Integration Approach for Hierarchical Data in Domain of Intelligent Livelihood

Interactive Constrained Association Rule Mining

Structured Search Result Differentiation

Mining Extremely Small Data Sets with Application to Software Reuse

Deep Web Interface Schemas Integration via Merging Trees

HISMA - A Human-Machine Iterative Schema Matching Algorithm.

Mining Frequent Induced Subtree Patterns with Subtree-Constraint