Abstract:Keyword search enables web users to easily access XML data without understanding the complex data schemas. However, the native ambiguity of keyword search makes it arduous to select qualified relevant results matching keywords. To solve this problem, researchers have made much effort on establishing ranking models distinguishing relevant and irrelevant passages, such as the highly cited TF*IDF and BM25. However, these statistic based ranking methods mostly consider term frequency, inverse document frequency and length as ranking factors, ignoring the distribution and connection information between different keywords. Hence, these widely used ranking methods are powerless on recognizing irrelevant results when they are with high term frequency, indicating a performance limitation. In this paper, a new searching system XDist is accordingly proposed to attack the problems aforementioned. In XDist, we firstly use the semantic query model maximal lowest common ancestor (MAXLCA) to recognize the returned results of a given query, and then these candidate results are ranked by BM25. Especially, XDist re-ranks the top several results by a combined distribution measurement (CDM) which considers four measure criterions: term proximity, intersection of keyword classes, degree of integration among keywords and quantity variance of keywords. The weights of the four measures in CDM are trained by a listwise learning to optimize method. The experimental results on the evaluation platform of INEX show that the re-ranking method CDM can effectively improve the performance of the baseline BM25 by 22% under iP[0.01] and 18% under MAiP. Also the semantic model MAXLCA and the search engine XDist perform the best in their respective related fields.

Text Distinguishers Used in an Interactive Meta Search Engine.

Differentiating Search Results on Structured Data

Structured Search Result Differentiation

Using Online Relevance Feedback to Build Effective Personalized Metasearch Engine

On-Line Selection Of Distinguishing Elements For Focused Information Retrieval

Exploiting Community Feedback for Information Retrieval in Dht Networks

Still Haven't Found What You're Looking For -- Detecting the Intent of Web Search Missions from User Interaction Features

A General Framework to Resolve the MisMatch Problem in XML Keyword Search

XDist: an Effective XML Keyword Search System with Re-Ranking Model Based on Keyword Distribution

Understanding Differential Search Index for Text Retrieval

XSACT: a comparison tool for structured search results

Meta-evaluation of Online and Offline Web Search Evaluation Metrics

TargetSearch: A Ranking Friendly XML Keyword Search Engine.

Personalized Search: an Interactive and Iterative Approach.

PERSONALIZED SEARCH RESEARCH BASED ON LOCAL ADD-ON

Interrogative-guided re-ranking for question-oriented software text retrieval.

Refining the Results of Automatic e-Textbook Construction by Clustering

A HTML Parser to Improve Chinese Search Engines

Removing The Mismatch Headache In Xml Keyword Search

Efficient Interactive Fuzzy Keyword Search.

Combining Strategies For Xml Retrieval