Finding the Hottest Item in Data Streams

Huaizhong Lin,Shanshan Wu,Leong U. Hou,Ngai Meng Kou,Yunjun Gao,Dongming Lu
DOI: https://doi.org/10.1016/j.ins.2017.11.012
IF: 8.1
2017-01-01
Information Sciences
Abstract:We study a problem of finding the hottest item interval in a data stream, where the hotness of an item over an interval is determined by its average frequency. Finding the hottest item interval is particularly helpful in business promotions, such as monitoring the peak sales records, finding the hottest period in an online game, digging the highest click rate of an online music, etc. Existing work focus on finding the most frequent item over a fixed length interval. However, these solutions cannot return the hottest interval since the best length (i.e., maximizing the average frequency) is unknown in advance. To discover the hottest item interval, a straightforward solution is to calculate the average frequencies of items for every possible interval length, which is too costly for stream applications. To efficiently compute the hottest item interval, we propose an algorithm that employs the arrival timestamps of items and reduce the search space by three pruning strategies. Extensive experiments show that the proposed algorithms can efficiently discover the hottest item interval on both real and synthetic datasets.
What problem does this paper attempt to address?