Discovering the k Representative Skyline Over a Sliding Window
Mei Bai,Junchang Xin,Guoren Wang,Luming Zhang,Roger Zimmermann,Ye Yuan,Xindong Wu
DOI: https://doi.org/10.1109/TKDE.2016.2546242
IF: 9.235
2016-08-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:A representative skyline contains $k$ skyline points that can represent its corresponding full skyline. The existing measuring criteria of $k$ representative skylines are specifically designed for static data, and they cannot effectively handle streaming data. In this paper, we focus on the problem of calculating the $k$ representative skyline over data streams. First, we propose a new criterion to choose $k$ skyline points as the $k$ representative skyline for data stream environments, termed the $k$ largest dominance skyline ( $k$ -LDS), which is representative to the entire data set and is highly stable over the streaming data. Second, we propose an efficient exact algorithm, called Prefix-based Algorithm (PBA), to solve the $k$ -LDS problem in a 2-dimensional space. The time complexity of PBA is only $\mathcal {O}((M-k)\times k)$ where $M$ is the size of the full skyline set. Third, the $k$ -LDS problem for a $d$ -dimensional ( $d\ge 3$ ) space turns out to be very complex. Therefore, a greedy algorithm is designed to answer $k$ -LDS queries. To further accelerate the calculation, we propose a $\epsilon$ -greedy algorithm which can achieve an approximate factor of $\frac{1}{(1+\epsilon)}(1-\frac{1}{\sqrt{e}})$ . Experimental results on both synthetic and real-world data show that our $k$ -LDS significantly outperforms its competitors in data stream environments. Furthermore, we demonstrate that the proposed $\epsilon$ -greedy algorithm can solve $k$ -LDS efficiently and with a competitive accuracy.