Truth Discovery in Data Streams: A Single-Pass Probabilistic Approach.

Zhou Zhao,James Cheng,Wilfred Ng
DOI: https://doi.org/10.1145/2661829.2661892
2014-01-01
Abstract:Truth discovery is a long-standing problem for assessing the validity of information from various data sources that may provide different and conflicting information. With the increasing prominence of data streams arising in a wide range of applications such as weather forecast and stock price prediction, effective techniques for truth discovery in data streams are demanded. However, existing work mainly focuses on truth discovery in the context of static databases, which is not applicable in applications involving streaming data. This motivates us to develop new techniques to tackle the problem of truth discovery in data streams. In this paper, we propose a probabilistic model that transforms the problem of truth discovery over data streams into a probabilistic inference problem. We first design a streaming algorithm that infers the truth as well as source quality in real time. Then, we develop a one-pass algorithm, in which the inference of source quality is proved to be convergent and the accuracy is further improved. We conducted extensive experiments on real datasets which verify both the efficiency and accuracy of our methods for truth discovery in data streams.
What problem does this paper attempt to address?