Recommendations for Bioinformatics in Clinical Practice
Ksenia Lavrichenko,Emilie Sofie Engdal,Rasmus L. Marvig,Anders Jemt,Jone Marius Vignes,Henrikki Almusa,Kristine Bilgrav Saether,Eirikur Briem,Eva Caceres,Edda Maria Elvarsdottir,Magnus Halldor Gislason,Maria K. Haanpaa,Viktor Henmyr,Ronja Hotakainen,Eevi Kaasinen,Roan Kanninga,Sofia Khan,Mary Gertrude Lie-Nielsen,Majbritt Busk Madsen,Niklas Mahler,Khurram Maqbool,Ramprasad Neethiraj,Karl Nyren,Minna Paavola,Peter Pruisscher,Ying Sheng,Ashish Kumar Singh,Aashish Srivastava,Thomas K. Stautland,Daniel T. Andreasen,Esmee ten Berk de Boer,Soren Vang,Valtteri Wirta,Frederik Otzen Bagger
DOI: https://doi.org/10.1101/2024.11.23.624993
2024-11-26
Abstract:Next Generation Sequencing (NGS) is increasingly used in clinical diagnostics, largely driven by the success and robustness of Whole Genome Sequencing (WGS). Whereas updated guidelines exist for how to interpret and report on variants that are identified from NGS using bioinformatics pipelines, there is a need for standardised bioinformatics practices for diagnostics to ensure clinical consensus, accuracy, reproducibility and comparability of the results. This article presents consensus recommendations developed by 13 clinical bioinformatics units taking part in the Nordic Alliance for Clinical Genomics (NACG), by expert bioinformaticians working in clinical production. The recommendations are based on clinical practice and focus on analysis types, test and validation, standardisation and accreditation, as well as core competencies and technical management required for clinical bioinformatics operations.
Key recommendations include adopting the hg38 genome build as the reference and a standard set of recommended analyses, including the use of multiple tools for structural variant (SV) calling and in-house data sets for filtering recurrent calls. Clinical bioinformatics production should operate under the ISO 15189 standard, utilising off-grid clinical-grade high-performance computing systems, standardised file formats, and strict code version control. Containerized software containers or environment management systems are needed to ensure reproducibility.
Pipelines should be rigorously documented and tested for accuracy and reproducibility, minimally covering unit, integration, and end-to-end testing. Standard truth sets such as GIAB and SEQC2 for germline and somatic variant calling, respectively, should be supplemented by recall testing of previously validated clinical cases. Data integrity must be verified using file hashing, and sample identity should be checked via sample fingerprinting and genetically inferred identification markers such as sex and relatedness.
Finally, clinical bioinformatics teams should encompass diverse skills, including software development, data management, quality assurance, and domain expertise in human genetics. These recommendations provide a consensus framework for standardising bioinformatics practices across clinical WGS applications and can serve as a practical guide to facilities that are new to large-scale sequencing-based diagnostics, or as a reference for those who already run high-volume clinical production using NGS.
Bioinformatics