Abstract:Abstract Background Resolution estimation is the main evaluation criteria for the reconstruction of macromolecular 3D structure in the field of cryoelectron microscopy (cryo-EM). At present, there are many methods to evaluate the 3D resolution for reconstructed macromolecular structures from Single Particle Analysis (SPA) in cryo-EM and subtomogram averaging (SA) in electron cryotomography (cryo-ET). As global methods, they measure the resolution of the structure as a whole, but they are inaccurate in detecting subtle local changes of reconstruction. In order to detect the subtle changes of reconstruction of SPA and SA, a few local resolution methods are proposed. The mainstream local resolution evaluation methods are based on local Fourier shell correlation (FSC), which is computationally intensive. However, the existing resolution evaluation methods are based on multi-threading implementation on a single computer with very poor scalability. Results This paper proposes a new fine-grained 3D array partition method by key-value format in Spark. Our method first converts 3D images to key-value data (K-V). Then the K-V data is used for 3D array partitioning and data exchange in parallel. So Spark-based distributed parallel computing framework can solve the above scalability problem. In this distributed computing framework, all 3D local FSC tasks are simultaneously calculated across multiple nodes in a computer cluster. Through the calculation of experimental data, 3D local resolution evaluation algorithm based on Spark fine-grained 3D array partition has a magnitude change in computing speed compared with the mainstream FSC algorithm under the condition that the accuracy remains unchanged, and has better fault tolerance and scalability. Conclusions In this paper, we proposed a K-V format based fine-grained 3D array partition method in Spark to parallel calculating 3D FSC for getting a 3D local resolution density map. 3D local resolution density map evaluates the three-dimensional density maps reconstructed from single particle analysis and subtomogram averaging. Our proposed method can significantly increase the speed of the 3D local resolution evaluation, which is important for the efficient detection of subtle variations among reconstructed macromolecular structures.

scSparkXMBD - High-Performance scRNA-seq Data Processing with Spark.

SCAN: A Smart Application Platform for Empowering Parallelizations of Big Genomic Data Analysis in Clouds

Accelerating Large-Scale Genomic Analysis With Spark

A Spark ML driven preprocessing approach for deep learning based scholarly data applications

A Survey on Spark Ecosystem for Big Data Processing

Sparksw: Scalable Distributed Computing System For Large-Scale Biological Sequence Alignment

scDAPP: a comprehensive single-cell transcriptomics analysis pipeline optimized for cross-group comparison

SPARK-X: non-parametric modeling enables scalable and robust detection of spatial expression patterns for large spatial transcriptomic studies

ScSmOP: a universal computational pipeline for single-cell single-molecule multiomics data analysis

Building Near-Real-Time Processing Pipelines with the Spark-MPI Platform

scX: A user-friendly tool for scRNA-seq exploration

scRNASequest: an ecosystem of scRNA-seq analysis, visualization, and publishing

scExplorer: A Comprehensive Web Server for Single-Cell RNA Sequencing Data Analysis

Distributed Gene Clinical Decision Support System Based on Cloud Computing

SparkGC: Spark based genome compression for large collections of genomes

Accelerating Single-Cell Sequencing Data Analysis with SciDAP: A User-Friendly Approach

SSCC: A Novel Computational Framework for Rapid and Accurate Clustering Large-scale Single Cell RNA-seq Data.

scX: a user-friendly tool for scRNAseq exploration

Spark-based parallel calculation of 3D fourier shell correlation for macromolecule structure local resolution estimation

scATAC-seq preprocessing and imputation evaluation system for visualization, clustering and digital footprinting