Abstract:Over the past decade, the volume of information available digitally over the Internet has grown enormously. Technical developments in the area of search, such as Google's Page Rank algorithm, have proved so good at serving relevant results that Internet search has become integrated into daily human activity. One can endlessly explore topics of interest simply by querying and reading through the resulting links. Yet, although search engines are well known for providing relevant results based on users' queries, users do not always receive the results they are looking for. Google's Director of Research describes clickstream evidence of frustrated users repeatedly reformulating queries and searching through page after page of results. Given the general quality of search engine results, one must consider the possibility that the frustrated user's query is not effective; that is, it does not describe the essence of the user's interest. Indeed, extensive research into human search behavior has found that humans are not very effective at formulating good search queries that describe what they are interested in. Ideally, the user should simply point to a portion of text that sparked the user's interest, and a system should automatically formulate a search query that captures the essence of the text. In this paper, we describe an implemented system that provides this capability. We first describe how our work differs from existing work in automatic query formulation, and propose a new method for improved quantification of the relevance of candidate search terms drawn from input text using phrase-level analysis. We then propose an implementable method designed to provide relevant queries based on a user's text input. We demonstrate the quality of our results and performance of our system through experimental studies. Our results demonstrate that our system produces relevant search terms with roughly two-thirds precision and recall compared to search terms selected by experts, and that typical users find significantly more relevant results (31% more relevant) more quickly (64% faster) using our system than self-formulated search queries. Further, we show that our implementation can scale to request loads of up to 10 requests per second within current online responsiveness expectations (<2-second response times at the highest loads tested).

Proximity full-text searches of frequently occurring words with a response time guarantee

Proximity Full-Text Search with a Response Time Guarantee by Means of Additional Indexes

Proximity Full-Text Search by Means of Additional Indexes with Multi-component Keys: In Pursuit of Optimal Performance

Selection of Optimal Parameters in the Fast K-Word Proximity Search Based on Multi-component Key Indexes

Relevance ranking for proximity full-text search based on additional indexes with multi-component keys

An efficient algorithm for three-component key index construction

Processing Long Queries Against Short Text

Processing Spatial Keyword Query As a Top-K Aggregation Query

Using Proximity in Query Focused Multi-document Extractive Summarization

Exploring and Exploiting Proximity Statistic for Information Retrieval Model.

Efficient Computation of a Proximity Matching in Spatial Databases

Scalable Top-K Spatial Keyword Search

Using Additional Indexes for Fast Full-Text Search of Phrases That Contain Frequently Used Words

A Study on Query Terms Proximity Embedding for Information Retrieval

Keyword-based k-nearest neighbor search in spatial databases.

Efficient Automatic Search Query Formulation Using Phrase-Level Analysis

Fast and Exact Nearest Neighbor Search in Hamming Space on Full-Text Search Engines

Efficient Algorithms for Top-k Keyword Queries on Spatial Databases

Keyword Search in Spatial Databases: Towards Searching by Document

Tk-Sk: Textual-Restricted K Spatial Keyword Query On Road Networks

TSS: Efficient Term Set Search in Large Peer-to-Peer Textual Collections