Constructing Cube Blocks Effectively for Stream Data Analysis

Lizheng Jiang,Dongqing Yang,Shiwei Tang,Xiuli Ma,Dehui Zhang
DOI: https://doi.org/10.1109/WAIMW.2006.10
2006-01-01
Abstract:With the rapid growth of WWW, many applications based on web are generating tremendous amount of data. Analyzing and mining such data will be important for administrators and other users. Because these data are increasing continuously and rapidly, retrieving these data efficiently and effectively are challenging tasks. In this paper, we propose a new Cube Block model to organize multi-dimensional data efficiently for online analysis and mining. The main idea of Cube Block model is to split the whole data set into blocks by dividing time into exponential time frames. In each block, we omit individual time identifiers of data items and aggregate duplicate data items to one data cell. Cube Block model uses the approximating method to get historical data in any time interval. The approximation accuracy is guaranteed by an upper bound and a lower bound. For the implementation of Cube Block model, we develop the CB-Builder system to scan data and construct cube blocks. CB-Builder¿s algorithm complexity (runtime and space cost) is analyzed in this paper. Experiments on synthetic data sets demonstrate that Cube Block model with its implementation CB-Builder is an effective and efficient tool for web application data analysis and mining.
What problem does this paper attempt to address?