Massive Geo-spatial Data Cloud Storage and Services Based on NoSQL Database Technique
Chongcheng CHEN,Jianfeng LIN,Xiaozhu WU,Jianwei WU,Huiqun LIAN
DOI: https://doi.org/10.3724/sp.j.1047.2013.00166
2013-01-01
Geo-information Science
Abstract:In recent years, how to implement a efficient storage management on massive geo-spatial data and ul-teriorly web service for a broad variety of users, has becomes an increasingly hot issue in the field of geographi-cal information science, with the explosive growth of Earth Observation System(EOS) data and the flourish of the new geography paradigm. A cloud storage system to provide distributed cloud-enabled storage management and services for massive geo-spatial data with an integrity of both vector and raster formats is proposed in this paper in the light of their intrinsic differences. Based on three-tier layer architecture, we put forward its imple-mentation strategy and method of cloud storage management for raster and vector data respectively based on NoSQL database system, followed by a universal data access interface. The novel technolgies, which include dis-tribute graph database-Neo4J and parralel graph compute framework on massive vector data storage and process were introduced. In our research, using the distributed file system-HDFS and the column family database-HBase as a container to store massive raster data with a distributed space index technique, and the distributed graph data-base system-Neo4J is used to store massive vector data in view of the constraints of ACID with a R-tree space in-dex. Under the unified framework of Geographical Knowledge Cloud platform GeoKSCloud developed by our research group as a successor of GeoKSCloud, its core components - spatial data aggregation centre (GeoDAC) software has been in shape with aim to provide some distributed spatial data storage management and access ser-vices for all types of end users. A tesbed is established with serveral 5 physical nodes and accordingly 7 virtual nodes with different areas and operational systems. We carried out an elaborate comparison between GeoDAC and open source GIS software - PostGIS to validate vector data reading & writing performance. The prelimi-nary results indicated that, although GeoDAC has no accelerated write performance than PostGIS, but it gains significant powerful reading or spatial query performance than PostGIS. Inside GeoDAC, space-partitioned mas-sive data is distributed on the cluster and spatial query operation is implemented in parallel, consequently an en-hanced rate of spatial query is gained. The achieved techniques and system in our work will provide a variety of users a powerful tool for further in-depth processing and owns a broad application prospects.