Research and Implementation of Multi-Terabyte Data Real-Time Loading Technology
Weihong Han,Yan Jia,Shuqiang Yang
2009-01-01
Journal of Computer Research and Development
Abstract:With the rapid development of the Internet and communication technology,massive data has been accumulated in many applications,for example,Internet information security management,storage and transaction of large scale scientific compute middle results(nuclear simulation,meteorology analysis),real-time monitoring system information(sensor network,astronomy meteorology monitoring system,secondary planet monitoring system),and analysis and research based on Internet information.Increasing data volumes and real time data-loading requirement pose enormous challenges to data-loading techniques.A data loading system in real time,the IMIL(Internet monitoring information loader) is presented which is used in real-time Internet monitoring information system.IMIL consists of an extensible fault-tolerant hardware architecture,an efficient algorithm for bulk data loading using SQL*Loader and an exchange partition mechanism,optimized parallelism,and guidelines for system tuning.Performance studies show the positive effects of these techniques with loading speed of every cluster,increasing from 220 million records per day to 1.2 billion per day,and achieving the top loading speed of 6TB data when 10 clusters are in parallel.This framework offers a promising approach for loading other large and complex databases.