Abstract:The discovery of association rules from large amounts of structured or semi-structured data is an important data mining problem [Agrawal et al. 1993, Agrawal and Srikant 1994, Miyahara et al. 2001, Termier et al. 2002, Braga et al. 2002, Cong et al. 2002, Braga et al. 2003, Xiao et al. 2003, Maruyama and Uehara 2000, Wang and Liu 2000]. It has crucial applications in decision support and marketing strategy. The most prototypical application of association rules is market basket analysis using transaction databases from supermarkets. These databases contain sales transaction records, each of which details items bought by a customer in the transaction. Mining association rules is the process of discovering knowledge such as “80% of customers who bought diapers also bought beer, and 35% of customers bought both diapers and beer”, which can be expressed as “diaper ? beer” (35%, 80%), where 80% is the confidence level of the rule, and 35% is the support level of the rule indicating how frequently the customers bought both diapers and beer. In general, an association rule takes the form X ? Y (s, c), where X and Y are sets of items, and s and c are support and confidence, respectively. In the XML Era, mining association rules is confronted with more challenges than in the traditional well-structured world due to the inherent flexibilities of XML in both structure and semantics [Feng and Dillon 2005]. First, XML data has a more complex hierarchical structure than a database record. Second, elements in XML data have contextual positions, which thus carry the order notion. Third, XML data appears to be much bigger than traditional data. To address these challenges, the classic association rule mining framework originating with transactional databases needs to be re-examined.

Mining frequent association tag sequences for clustering XML documents

Mining Fuzzy Association Rules in Data Streams

Bottom-up Discovery of Frequent Rooted Unordered Subtrees

FXProj – A Fuzzy XML Documents Projected Clustering Based on Structure and Content

Xproj: A Framework For Projected Structural Clustering Of Xml Documents

Xml Structural Similarity Search Using Mapreduce

Tree model guided candidate generation for mining frequent subtrees from XML documents

A Model To Enhance Xml Document Clustering

Discovery of Frequent Query Patterns in XML Pattern Graph with DTD Cardinality Constraints

Efficient Algorithms For Association Finding And Frequent Association Pattern Mining

Efficient Mining of Frequent Closed XML Query Pattern.

Evaluate Structure Similarity In Xml Documents With Merge-Edit-Distance

A Novel Path-Based Method for Clustering XML Schemas

Semantic Text Mining with Linked Data

Efficient mining of frequent closed XML query pattern

XML-Enabled Association Analysis

Research on Evaluating Structural Similarity between XML Documents

FICW: Frequent Itemset Based Text Clustering with Window Constraint

Approximate top-k structural similarity search over XML documents

A Flexible Structured-based Representation for XML Document Mining

Improving Xml Querying with Maximal Frequent Query Patterns