Contextualized web search: query-dependent ranking and social media search
H. Zha,Jiang Bian
Abstract:Due to the information explosion on the Internet, effective information search techniques are required to retrieve the desired information from the Web. With much analysis on users’ search intention and the variant forms of Web content, we find that both the query and the indexed web content are often associated with various context information, which can provide much essential information to indicate the ranking relevance in Web search. Although there have been many existing studies on extracting the context information of both the query and the Web content, little research has addressed exploring these context information to improve Web search. This dissertation seeks to develop new search algorithms and techniques by taking advantage of rich context information to improve search quality.
This dissertation consists of two major parts. In the first one, we study how to explore the context information of the query to improve search performance. Since Web queries are usually very short, it is difficult to extract precise information need from the query itself. We propose to take advantage of the context information, such as the search intention of the query, to improve the ranking relevance. According to the query difference in terms of search intention, we first introduce the query-dependent loss function, by optimizing which we can obtain better ranking model. However, in practical search engine, it is uneasy to precisely define the query-dependent loss function. And, inspired by the requirement of deep dive and incremental update on dedicated ranking models, we investigate a divide-and-conquer framework for ranking specialization. Experimental results on a large scale data set from a commercial search engine demonstrate significant improvement on search performance over currently applied ranking models without considering query context.
The second part of this dissertation investigates how to extract the context of specific Web content and explore them to build more effective search system. This study focuses on searching over social media, the new emerging form of Web content. As the fastest growing segment of the Web, social media services establish new forums for content creation Daily, huge amount of social media content are collaboratively generated by millions of Web users, driven by various of social activities. Due to the valuable information contained in the resulting archives of both the content and the context of the interactions, computational methods for knowledge acquisition has become an important topic in social media analysis. Unlike traditional Web content, social media content is inherently associated with much new types of context information, including content quality, user reputation, and user interactions, all of which provide useful information for acquiring knowledge from social media. In this dissertation, we seek to develop algorithms and techniques for effective knowledge acquisition from collaborative social media environments by using the dynamic context information. In particular, this study first proposes a new general framework for searching social media content, which integrates both the content features and the user interactions. Then, a semi-supervised framework is proposed to explicitly compute content quality and user reputation in social media. These new context information are incorporated into the general search framework to improve the search quality. Experimental results of large scale evaluation on real world social media content demonstrate that this research achieves significant improvements over previous approaches for information search in social media. Furthermore, this dissertation also investigates techniques for extracting the structured semantics of social media content. Experimental results demonstrate that this kind of context information is essential for improving the performance of content organization and retrieval over social media service.
Computer Science