Abstract:Most queries in web search are ambiguous and multifaceted. Identifying the major senses and facets of queries from search log data, referred to as query subtopic mining in this paper, is a very important issue in web search. Through search log analysis, we show that there are two interesting phenomena of user behavior that can be leveraged to identify query subtopics, referred to as `one subtopic per search' and `subtopic clarification by keyword'. One subtopic per search means that if a user clicks multiple URLs in one query, then the clicked URLs tend to represent the same sense or facet. Subtopic clarification by keyword means that users often add an additional keyword or keywords to expand the query in order to clarify their search intent. Thus, the keywords tend to be indicative of the sense or facet. We propose a clustering algorithm that can effectively leverage the two phenomena to automatically mine the major subtopics of queries, where each subtopic is represented by a cluster containing a number of URLs and keywords. The mined subtopics of queries can be used in multiple tasks in web search and we evaluate them in aspects of the search result presentation such as clustering and re-ranking. We demonstrate that our clustering algorithm can effectively mine query subtopics with an F1 measure in the range of 0.896-0.956. Our experimental results show that the use of the subtopics mined by our approach can significantly improve the state-of-the-art methods used for search result clustering. Experimental results based on click data also show that the re-ranking of search result based on our method can significantly improve the efficiency of users' ability to find information.

Mining Subtopics from Text Fragments for a Web Query

Query Subtopic Mining Via Subtractive Initialization of Non-negative Sparse Latent Semantic Analysis

Mining Search Subtopics from Query Logs.

Mining Query Subtopics from Search Log Data

Improve Web Search Diversification with Intent Subtopic Mining

Mining Query Subtopics from Questions in Community Question Answering

Mining and ranking users’ intents behind queries

Qualifier Mining for NTCIR-INTENT.

HITSCIR System in NTCIR-9 Subtopic Mining Task

Summary Of The Ntcir-10 Intent-2 Task: Subtopic Mining And Search Result Diversification

ICRCS at Intent2: Applying Rough Set and Semantic Relevance for Subtopic Mining.

IMC at the NTCIR-12 IMine-2 Query Understanding Subtask.

Automatically Mining Facets for Queries from Their Search Results

HIT 2 Joint NLP Lab at the NTCIR-9 Intent Task

Bundle Fragments into a Whole: Mining More Complete Clusters via Submodular Selection of Interesting webpages for Web Topic Detection

A Method of Mining Query Facets Based on Term Graph Analysis

Topic Distillation Via Sub-Site Retrieval

Mining Multi-Faceted Overviews of Arbitrary Topics in a Text Collection

Summary of the NTCIR-10 INTENT-2 task

Web Documents Mining

Mining Topic-specific Knowledge on Web