Vector Spatial Data Cloud Storage and Processing Based on MongoDB

LEI Delong,GUO Diansheng,CHEN Chongcheng,WU Jianwei,WU Xiaozhu
DOI: https://doi.org/10.3724/sp.j.1047.2014.00507
2014-01-01
Abstract:With the rapidly growing volume and complexity of spatial data, efficient storage and processing of massive geo-spatial data have become urgent research problems in GIScience and related fields. Vector spatial data storage and processing is particularly challenging due to its complexity in data representation, access and analysis. In this paper, we present an approach and a system for cloud-based vector data storage and analysis,with the ability to support multi-user access and parallel processing. Our system, named VectorDB, extends MongoDB(a document-oriented NoSQL database system) and integrates the Hadoop framework for parallel spatial data processing and analysis. With a three-layered browser-server architecture, the system consists of a suite of modules for data storage, conversion, query, and analysis. The OGR Simple Features Library is integrated to perform data conversions between MongoDB and various formats of vector spatial data. We use the MongoDB Connector for Hadoop(mongo-hadoop) to transfer data between MongoDB and Hadoop. An experiment is carried out using five physical servers to compare the performance of VectorDB and PostGIS for vector data reading,writing, and query. Preliminary results indicate that, although VectorDB is slightly slower in data writing, it gains significant power for data access and spatial query over PostGIS. We also compared VectorDB and MongoDB for massive vector data processing. Results show that VectorDB has a better performance than MongoDB in massive vector data processing. VectorDB is different from the traditional relational spatial database, and it can support dynamic schema and thus is much more flexible and effective for storing, accessing, and analyzing various vector spatial data models and data formats. Our approach and implemented system will be useful for a variety of applications that need to store and access vector spatial data in a cloud environment.
What problem does this paper attempt to address?