Abstract:BackgroundIn complex automated clinical laboratory Next Generation Sequencing (NGS) workflows, numerous opportunities exist for contamination events to occur. For oncology testing applications, detection of these events is crucial for accurate results reporting. A contamination detection module (MICon) using microhaplotype (MH) regions and an in-house designed analysis model was developed for use in a NGS assay for myeloid neoplasms (NGSHM) [1]. Variant detection for NGSHM has a validated analytical sensitivity of 2% variant allele frequency (VAF). Gross contamination events can potentially cause erroneous false positive variant detection whereas true low VAF mutations may be masked and evade detection. The following study was performed to evaluate MICon module performance on samples processed over the course of a year.MethodsMH regions contain highly conserved tandem single nucleotide polymorphisms (SNPs) with high global heterogeneity occurring within a 300-nucleotide span. From the work of Kidd et al [2], 27 MH regions covering 92 SNPs and spanning 16 chromosomes were selected for inclusion in the 47 gene NGSHM target-capture panel. Sequencing was performed on a NovaSeq 6000 and analyzed through an internal bioinformatics pipeline, which incorporates the MICon module. MICon uses a binary classification model incorporating VAFs from MH regions, number of MH genotypes, and the contamination estimation score from verifyBamID to compute a single value on a 0-100 scale. Scores <50 are classified as non-contaminated and scores ≥50 require further investigation.ResultsThe dataset was comprised of 10,990 samples. A total of 10,232 patient cases (93.1%) had non-contaminated MICon scores <50. Control replicates accounted for 261 samples (2.4%). There were 497 patient cases (4.5%) with MICon scores ≥50. Of the 497 patient cases, 137 (27.6%) had undergone hematopoietic stem cell transplantation (HSCT) and 114 (22.9%) contained gross chromosomal abnormalities, representing iatrogenic or tumor biological factors. An additional 135 (27.2%) of cases were contaminated from laboratory processing errors, 45 (9.1%) occurred due to chemistry failures during run processing, and 66 (13.2%) had an undetermined cause for a high score, representing laboratory-related and unknown factors. Overall, the 135 laboratory processing errors detected accounted for 1.3% of patient cases analyzed by the laboratory. These cases were repeated, and the non-contaminated results were accurately reported.ConclusionsHerein it was shown the implementation of MICon has improved patient safety over the course of a year. The 1.3% of patient cases identified as arising from laboratory processing errors were successfully reprocessed. While the overall rate of laboratory processing errors appears low, the absolute number can be significant in a high-volume test setting, despite reliance on highly automated processes. Value is added by MICon through identifying errors that could result in the release of inaccurate patient results. By detecting previously unrealized errors, MICon has become an invaluable asset to the NGSHM assay by enhancing patient laboratory testing safety. References :[1] Balan J, et al. MICon Contamination Detection Workflow for Next-Generation Sequencing Laboratories Using Microhaplotype Loci and Supervised Learning. J Mol Diagn . 2023;25(8):602-610. doi:10.1016/j.jmoldx.2023.05.001.[2] Kidd KK, et al. Selecting microhaplotypes optimized for different purposes. Electrophoresis . 2018;39(21):2815-2823. doi:10.1002/elps.201800092.

Impact of Interval Censoring on Data Accuracy and Machine Learning Performance in Biological High-Throughput Screening

Machine Learning-Driven Data Valuation for Optimizing High-Throughput Screening Pipelines

Probabilistic PCA of Censored Data: Accounting for Uncertainties in the Visualization of High-Throughput Single-Cell Qpcr Data.

FUSE: Improving the estimation and imputation of variant impacts in functional screening

Integrated approach to generate artificial samples with low tumor fraction for somatic variant calling benchmarking

Data Valuation: A novel approach for analyzing high throughput screen data using machine learning

High efficiency error suppression for accurate detection of low-frequency variants

Data Tells the Truth: A Knowledge Distillation Method for Genomic Survival Analysis by Handling Censoring

Statistical Process Control Charts for Monitoring Next-Generation Sequencing and Bioinformatics Turnaround in Precision Medicine Initiatives

The impact of different censoring methods for analyzing survival using real-world data with linked mortality information: a simulation study

Gene Screening in High-Throughput Right-Censored Lung Cancer Data

α-KIDS: A novel feature evaluation in the ultrahigh-dimensional right-censored setting, with application to Head and Neck Cancer

Inference for High-Dimensional Censored Quantile Regression

Issues of Z-factor and an approach to avoid them for quality control in high-throughput screening studies

B-232 One Year Post-Implementation Experience Using a Custom Designed DNA Contamination Detection Module for a Clinical Myeloid Neoplasm Next Generation Sequencing Assay

System-Wide Pollution of Biomedical Data: Consequence of the Search for Hub Genes of Hepatocellular Carcinoma Without Spatiotemporal Consideration

Detecting and Quantitating Low Fraction DNA Variants with Low-Depth Sequencing

Case report: A case study of variant calling pipeline selection effect on the molecular diagnostics outcome

Optimization of a deep mutational scanning workflow to improve quantification of mutation effects on protein–protein interactions

Benchmarking and optimization of a high‐throughput sequencing based method for transgene sequence variant analysis in biotherapeutic cell line development

Improving estimates of negative selection in human genome using CAPS