snpQT: flexible, reproducible, and comprehensive quality control and imputation of genomic data
Christina Vasilopoulou,Benjamin Wingfield,Andrew P. Morris,William Duddy
DOI: https://doi.org/10.48550/arXiv.2105.01923
2021-05-05
Abstract:Motivation: Quality control of genomic data is an essential but complicated multi-step procedure, often requiring separate installation and expert familiarity with a combination of disparate bioinformatics tools. Results: To provide an automated solution that retains comprehensive quality checks and flexible workflow architecture, we have developed snpQT, a scalable, stand-alone software pipeline, offering some 36 discrete quality filters or correction steps, with plots before-and-after user-modifiable thresholding. This includes build conversion, population stratification against 1,000 Genomes data, population outlier removal, and built-in imputation with its own pre- and post- quality controls. Common input formats are used and users need not be superusers nor have any prior coding experience. A comprehensive online tutorial and installation guide is provided through to GWAS (<a class="link-external link-https" href="https://snpqt.readthedocs.io/en/latest/" rel="external noopener nofollow">this https URL</a>), introducing snpQT using a synthetic demonstration dataset and a real-world Amyotrophic Lateral Sclerosis SNP-array dataset. Availability: snpQT is open source and freely available at <a class="link-external link-https" href="https://github.com/nebfield/snpQT" rel="external noopener nofollow">this https URL</a> Contact: Vasilopoulou-C@ulster.<a class="link-external link-http" href="http://ac.uk" rel="external noopener nofollow">this http URL</a>, <a class="link-external link-http" href="http://w.duddy" rel="external noopener nofollow">this http URL</a>@ulster.<a class="link-external link-http" href="http://ac.uk" rel="external noopener nofollow">this http URL</a>
Genomics
What problem does this paper attempt to address?