Object Summaries for Keyword Search.

Georgios John Fakas,Zhi Cai
DOI: https://doi.org/10.1142/s2529737617500022
2018-01-01
Abstract:The abundance and ubiquity of graphs (e.g., semantic knowledge graphs, such as Google’s knowledge graph, DBpedia; online social networks such as Google[Formula: see text], Facebook; bibliographic graphs such as DBLP, etc.) necessitates the effective and efficient search over them. Thus, we propose a novel keyword search paradigm, where the result of a search is an Object Summary (OS). More precisely, given a set of keywords that can identify a Data Subject (DS), our paradigm produces a set of OSs as results. An OS is a tree structure rooted at the DS node (i.e., a node containing the keywords) with surrounding nodes that summarize all data held on the graph about the DS. An OS can potentially be very large in size and therefore unfriendly for users who wish to view synoptic information about the data subject. Thus, we investigate the effective and efficient retrieval of concise and informative OS snippets (denoted as size-[Formula: see text] OSs). A size-[Formula: see text] OS is a partial OS containing [Formula: see text] nodes such that the summation of their importance scores results in the maximum possible total score. However, the set of nodes that maximize the total importance score may result in an uninformative size-[Formula: see text] OSs, as very important nodes may be repeated in it, dominating other representative information. In view of this limitation, we investigate the effective and efficient generation of two novel types of OS snippets, i.e., diverse and proportional size-[Formula: see text] OSs, denoted as DSize-[Formula: see text] and PSize-[Formula: see text] OSs. Namely, besides the importance of each node, we also consider its pairwise relevance (similarity) to the other nodes in the OS and the snippet. We conduct an extensive evaluation on two real graphs (DBLP and Google[Formula: see text]). We verify effectiveness by collecting user feedback, e.g., by asking DBLP authors (i.e., the DSs themselves) to evaluate our results. In addition, we verify the efficiency of our algorithms and evaluate quality of the snippets that they produce.
What problem does this paper attempt to address?