High Performance Computational Biology and Drug Design on TianHe Supercomputers.

Shaoliang Peng
DOI: https://doi.org/10.1109/bibm.2016.7822480
2016-01-01
Abstract:Summary form only given. Extremely powerful computers are needed to help scientists to handle high performance computational biology and drug design problems. The world's largest genomics institute BGI currently generates 6 TB data each day. The European Bioinformatics Institute (EBI) in Hinxton currently stores 20 petabytes (1 petabyte is 1015 bytes) of data and back-ups about genes, proteins and small molecules. TianHe supercomputers can speed up computational biology and drug design processing. In 2013, 2014, and 2015, Tianhe-2 topped the TOP500 list of fastest supercomputers in the world. Many well-known bioinformatics and drug design softwares (BWA, DOCK, SOAP3-dp, SOAPdenovo, SOAPsnp etc.) are developed and running on TH-2. The talk focuses on two main areas: 1. Drug Design: mD3DOCKxb is a largest high throughput molecular docking platform and finishes the docking of all the purchasable molecules (about 42 million) on earth within 24 hours.It has a parallel efficiency of over 70% using 192,000 CPU cores and 1,368,000 MIC cores. It gains the Gold Award of PAC 2015 (Parallel Application Challenge Competition) and is reported by CCTV 1, ScienceNet, China Science and Technology News, and 2015 Top 10 News of Hunan Province of China. 2. Genetic Engineering: The “Human Whole Genome Re-sequencing Analysis Software Pipeline” is firstly designed by applicant. The whole analyzing procedure takes 4 hours to finish the analysis of a 300 TB dataset of whole genome sequences from 2,000 human beings. The speedup is about 1200X. TianHe Supercomputers can handle 3 kinds of computational biology and drug design problems: computation intensive, memory intensive, and communication intensive. In future, TH-2 will be open online to all the scientists not only in China but also all over the world.
What problem does this paper attempt to address?