Counteracting Novelty Decay in First Story Detection.

Yumeng Qin,Dominik Wurzer,Victor Lavrenko,Cunchen Tang
DOI: https://doi.org/10.1007/978-3-319-56608-5_48
2017-01-01
Abstract:In this paper we explore the impact of processing unbounded data streams on First Story Detection (FSD) accuracy. In particular, we study three different types of FSD algorithms: comparison-based, LSH-based and k-term based FSD. Our experiments reveal for the first time that the novelty score of all three algorithms decay over time. We explain why the decay is linked to the increased space saturation and negatively affects detection accuracy. We provide a mathematical decay model, which allows compensating observed novelty scores by their expected decay. Our experiments show significantly increased performance when counteracting the novelty score decay.
What problem does this paper attempt to address?