Abstract:The rapid growth in genomic pathogen data spurs the need for efficient inference techniques, such as Hamiltonian Monte Carlo (HMC) in a Bayesian framework, to estimate parameters of these phylogenetic models where the dimensions of the parameters increase with the number of sequences $N$. HMC requires repeated calculation of the gradient of the data log-likelihood with respect to (wrt) all branch-length-specific (BLS) parameters that traditionally takes $\mathcal{O}(N^2)$ operations using the standard pruning algorithm. A recent study proposes an approach to calculate this gradient in $\mathcal{O}(N)$, enabling researchers to take advantage of gradient-based samplers such as HMC. The CPU implementation of this approach makes the calculation of the gradient computationally tractable for nucleotide-based models but falls short in performance for larger state-space size models, such as codon models. Here, we describe novel massively parallel algorithms to calculate the gradient of the log-likelihood wrt all BLS parameters that take advantage of graphics processing units (GPUs) and result in many fold higher speedups over previous CPU implementations. We benchmark these GPU algorithms on three computing systems using three evolutionary inference examples: carnivores, dengue and yeast, and observe a greater than 128-fold speedup over the CPU implementation for codon-based models and greater than 8-fold speedup for nucleotide-based models. As a practical demonstration, we also estimate the timing of the first introduction of West Nile virus into the continental Unites States under a codon model with a relaxed molecular clock from 104 full viral genomes, an inference task previously intractable. We provide an implementation of our GPU algorithms in BEAGLE v4.0.0, an open source library for statistical phylogenetics that enables parallel calculations on multi-core CPUs and GPUs.

Parallel Accelerated Custom Correlation Coefficient Calculations for Genomics Applications

Gene Sequence Alignment on a Public Computing Platform

Accelerating Sparse Canonical Correlation Analysis for Large Brain Imaging Genetics Data

The parallelism motifs of genomic data analysis

Parallelized Kendall's Tau Coefficient Computation via SIMD Vectorized Sorting On Many-Integrated-Core Processors

A high-performance computing toolset for relatedness and principal component analysis of SNP data

Fastgcn: A Gpu Accelerated Tool For Fast Gene Co-Expression Networks

Communication-Efficient Jaccard Similarity for High-Performance Distributed Genome Comparisons

Quantifying and Mitigating Computational Inefficiency of Genomics Data Analysis

Accelerating genomic workflows using NVIDIA Parabricks

High-Performance Genomic Analysis Heterogeneous System Using OpenCL

An efficient parallel algorithm for multiple sequence similarities calculation using a low complexity method.

Parallelization of Bayesian Network Based SNPs Pattern Analysis and Performance Characterization on SMP/HT

A Hybrid Computational Grid Architecture for Comparative Genomics

Accelerating Genome-Wide Association Studies Using CUDA Compatible Graphics Processing Units

Developing and Deploying Advanced Algorithms to Novel Supercomputing Hardware

Many-core algorithms for high-dimensional gradients on phylogenetic trees

A Parallel Implementation for Determining Genomic Distances under Deletion and Insertion.

Scalable and efficient DNA sequencing analysis on different compute infrastructures aiding variant discovery

Parallel Algorithms for Large-Scale Biological Sequence Alignment on Xeon-Phi Based Clusters

A Parallel Algorithm for Error Correction in High-Throughput Short-Read Data on CUDA-enabled Graphics Hardware.