An Efficient Parallel Topic-Sensitive Expert Finding Algorithm Using Spark

Yao-ming Yang, Chang-dong Wang, Jian-huang Lai
DOI: https://doi.org/10.1109/bigdata.2016.7841019
2016-01-01
Abstract:Expert finding is an important technique to obtain the user authority ranking in community question answering (CQA) websites. ZhihuRank is a topic-sensitive expert finding algorithm, which is based on both LDA and PageRank. Currently, with the amount of participants and documents increasing rapidly in CQA websites, how to parallel expert finding algorithms for big data analysis has received significant attention. In this paper, we find that the Spark framework is more suitable for paralleling expert finding algorithms than the MapReduce framework, which is a memory-based parallel computing model to support complicated iterative algorithms. As an example, we parallel ZhihuRank using MLlib's LDA and GraphX's PageRank in Spark. Experiments have been conducted on large-scale real data from Zhihu(1)(the most popular CQA website in China). And the experimental results confirmed the effectiveness and scalability of our proposed approach.
What problem does this paper attempt to address?