Abstract:Hyperspectral infrared atmospheric sounding data, characterized by their high vertical resolution, play a crucial role in capturing three-dimensional atmospheric spatial information. The hyperspectral infrared atmospheric detectors HIRAS/HIRAS-II, mounted on the FY3D/EF satellite, have established an initial global coverage network for atmospheric sounding. The collaborative observation approach involving multiple satellites will improve both the coverage and responsiveness of data acquisition, thereby enhancing the overall quality and reliability of the data. In response to the increasing number of channels, the rapid growth of data volume, and the specific requirements of multi-satellite joint observation applications with infrared hyperspectral sounding data, this paper introduces an efficient storage and indexing method for infrared hyperspectral sounding data within a distributed architecture for the first time. The proposed approach, built on the Kubernetes cloud platform, utilizes the Google S2 discrete grid spatial indexing algorithm to establish a grid-based hierarchical model for unified metadata-embedded documents. Additionally, it optimizes the rowkey design using the BPDS model, thereby enabling the distributed storage of data in HBase. The experimental results demonstrate that the query efficiency of the Google S2 grid-based embedded document model is superior to that of the traditional flat model, achieving a query time that is only 35.6% of the latter for a dataset of 5 million records. Additionally, this method exhibits better data distribution characteristics within the global grid compared to the H3 algorithm. Leveraging the BPDS model, the HBase distributed storage system adeptly balances the node load and counteracts the detrimental effects caused by the accumulation of time-series remote sensing images. This architecture significantly enhances both storage and query efficiency, thus laying a robust foundation for forthcoming distributed computing.

Sector and Sphere: Towards Simplified Storage and Processing of Large Scale Distributed Data

QoSC: A QoS-Aware Storage Cloud Based on HDFS

An Efficient and Compact Indexing Scheme for Large-Scale Data Store.

Analysis of Big Data Platform with OpenStack and Hadoop.

Vhadoop: A Scalable Hadoop Virtual Cluster Platform for MapReduce-Based Parallel Machine Learning with Performance Consideration

Distribution-Based Approach for Efficient Storage and Indexing of Massive Infrared Hyperspectral Sounding Data

Analyzing large-scale Data Cubes with user-defined algorithms: A cloud-native approach

Efficient B-tree Based Indexing for Cloud Data Processing.

Cloud Storage of Massive Remote Sensing Data Based on Distributed File System

Moving big data to the cloud

The Design and Implementation of Geographic Information Storage System Based on the Cloud Platform.

Building a Productive Domain-Specific Cloud for Big Data Processing and Analytics Service

Big Data Storage Index Mechanism Based on Spatiotemporal Information Cloud Platform

An Efficient Organization Method for Large-Scale and Long Time-Series Remote Sensing Data in a Cloud Computing Environment

Efficient Spatial Big Data Storage and Query in HBase.

Recent Developments in Parallel and Distributed Computing for Remotely Sensed Big Data Processing

Towards a New Model of Storage and Access to Data in Big Data and Cloud Computing

Spatial-Htm: A Mapreduce-Based System For Querying Spatial Data With The Hierarchical Triangular Mesh

Studies on the Large Scale Data Processing Technologies Used in Servers for Cloud Computing

Partition-based Data Cube Storage and Parallel Queries for Cloud Computing

Big Data Analytics on Traditional HPC Infrastructure Using Two-Level Storage