ArchDB: A High Reliable and High Performance Large-Scale Archived Stream Database

Kai Du,Wei Fu,Huaimin Wang,Shuqiang Yang
2009-01-01
Journal of Computer Research and Development
Abstract:Monitoring online transactions or tracking users'behaviors will generate large-scale archived streaming data in some domains,such as scientific experiments,Web site access logs,innernetwork audit logs and so on.These archived systems may scale up to petabytes(10~(15)B).Storing and analyzing the structural data in such scale calls forth at least three notable challenging issues.The first is data reliability.The second is to efficiently store and analyze high-rate streaming data that is continuously online generated.The third is how to tradeoff between high reliability and high performance in one approach because in many cases these two objectives conflict.A novel high reliable log-free database architecture,ArchDB,is proposed.ArchDB consists of two key components:one is for loading and querying the small-scale current data,and the other is responsible for storing and querying the large-scale historical archived data.In order to meet the three challenges,data placement policy,data block size and data archiving occasion,pipelining and parallelizing archiving procedure are all optimized.The experimental results show ArchDB can double the insertion performance and speed up the recovery process by a factor of the parallel recovery degree.
What problem does this paper attempt to address?