Abstract:A study has proposed a massive data storage method for electronic archive management based on HBase, aiming to improve the intelligence, efficiency and retrieval performance of data storage. The results show that this method is superior to traditional database systems in terms of write speed and query latency and is suitable for efficient storage and management of massive electronic archives. The acceleration of the digitalization process in enterprise and university education management has generated a massive amount of electronic archive data. In order to improve the intelligence, storage quality, and efficiency of electronic records management and achieve efficient storage and fast retrieval of data storage models, this study proposes a massive data storage model based on HBase and its retrieval optimization scheme design. In addition, HDFS is introduced to construct a two‐level storage structure and optimize values to improve the scalability and load balancing of HBase, and the retrieval efficiency of the HBase storage model is improved through SL‐TCR and BF filters. The results indicated that HDFS could automatically recover data after node, network partition, and NameNode failures. The write time of HBase was 56 s, which was 132 and 246 s less than Cassandra and CockroachDB. The query latency was reduced by 23% and 32%, and the query time was reduced by 9988.51 ms, demonstrating high reliability and efficiency. The delay of BF‐SL‐TCL was 1379.28 s after 1000 searches, which was 224.78 and 212.74 s less than SL‐TCL and Blockchain Retrieval Acceleration and reduced the delay under high search times. In summary, this storage model has obvious advantages in storing massive amounts of electronic archive data and has high security and retrieval efficiency, which provides important reference for the design of storage models for future electronic archive management. The storage model designed by the research institute has obvious advantages in storing massive electronic archive data, solving the problem of lack of scalability in electronic archive management when facing massive data, and has high security and retrieval efficiency. It has important reference for the design of storage models for future electronic archive management.

A Hadoop-based Massive Molecular Data Storage Solution for Virtual Screening

Efficient Large-Scale Virtual Screening Based On Heterogeneous Many-Core Supercomputing System

A Data Management Tool for Virtual Screening on Grid

Molecular docking-based computational platform for high-throughput virtual screening

Vhadoop: A Scalable Hadoop Virtual Cluster Platform for MapReduce-Based Parallel Machine Learning with Performance Consideration

HBaseSpatial: A Scalable Spatial Data Storage Based on HBase

Scalable Partitioning and Exploration of Chemical Spaces Using Geometric Hashing

Artificial intelligence-enabled virtual screening of ultra-large chemical libraries with deep docking

A Comprehensive Task Management system for large-scale Virtual Screening applications

High Throughput Virtual Screening with Data Level Parallelism in Multi-core Processors

Virtual Screening Methods As Tools for Drug Lead Discovery from Large Chemical Libraries.

Virtual Screening Methods As Tools For Drug Lead Discovery From Large Chemical Libraries

Virtual Drug Screen Schema Based on Multiview Similarity Integration and Ranking Aggregation

Massive Data HBase Storage Method for Electronic Archive Management

Block Storage Optimization and Parallel Data Processing and Analysis of Product Big Data Based on the Hadoop Platform

Redesigning Vina@QNLM for Ultra-Large-Scale Molecular Docking and Screening on a Sunway Supercomputer

A Review on Parallel Virtual Screening Softwares for High Performance Computers

Uni-Dock: GPU-Accelerated Docking Enables Ultralarge Virtual Screening

ZSMILES: an approach for efficient SMILES storage for random access in Virtual Screening

Rigorous Free Energy Simulations in Virtual Screening

Construction of a specialized integrated simulation platform for molecule screening based on scientific computing workflow engine