Abstract:In the XML community, exact queries allow users to specify exactly what they want to check and/or retrieve in an XML document. When they are applied to a semi-structured document or to a document with an overly complex model, the lack or the ignorance of the explicit document model (DTD-Document Type Definition, Schema, etc.) increases the risk of ob-taining an empty result set when the query is too specific, or, too large result set when it is too vague (e.g. it contains wildcards such as "*"). The reason is that in both cases, users write queries according to the document model they have in mind; this can be very far from the one that can actually be extracted from the document. Opposed to exact queries, preference queries are more flexible and can be relaxed to expand the search space during their evalua-tions. Indeed, during their evaluation, certain constraints (the preferences they contain) can be relaxed if necessary to avoid precisely empty results; moreover, the returned answers can be filtered to retain only the best ones. This paper presents an algorithm for evaluating such queries inspired by the TreeMatch algorithm proposed by Yao et al. for exact queries. In the pro-posed algorithm, the best answers are obtained by using an adaptation of the Skyline operator (defined in relational databases) in the context of documents (trees) to incrementally filter into the partial solutions set, those which satisfy the maximum of preferential constraints. The only restriction imposed on documents is No-Self-Containment.

A Flexible Structured-based Representation for XML Document Mining

Tree model guided candidate generation for mining frequent subtrees from XML documents

Web mining of relations from XML and construct database schema

Xproj: A Framework For Projected Structural Clustering Of Xml Documents

PKU at INEX 2010 XML Mining Track

A Model To Enhance Xml Document Clustering

Enhancing Content-And-Structure Information Retrieval using a Native XML Database

An Xml-Enabled Association Rule Framework

Introducing XML to program mining

A Tree Pattern Matching Algorithm for XML Queries with Structural Preferences

Bottom-up Discovery of Frequent Rooted Unordered Subtrees

Data Mining-based Fragmentation of XML Data Warehouses

Mining XML-Enabled Association Rules with Templates

Xml Structural Similarity Search Using Mapreduce

A Rule-Based Information Extraction System for Human-Readable Semi-Structured Scientific Documents

An Index Scheme for XML Documents Based on Relationship Joins

FXProj – A Fuzzy XML Documents Projected Clustering Based on Structure and Content

X-WACoDa: An XML-based approach for Warehousing and Analyzing Complex Data

XML Views: Part 1

Interactive Mining of Schema for Semistructured Data.

XML-based Data Mining Model Construction