Benchmarking digital PCR partition classification methods with empirical and simulated duplex data

Yao Chen,Ward De Spiegelaere,Wim Trypsteen,David Gleerup,Jo Vandesompele,Antoon Lievens,Matthijs Vynck,Olivier Thas
DOI: https://doi.org/10.1093/bib/bbae120
IF: 9.5
2024-03-27
Briefings in Bioinformatics
Abstract:Abstract Digital PCR (dPCR) is a highly accurate technique for the quantification of target nucleic acid(s). It has shown great potential in clinical applications, like tumor liquid biopsy and validation of biomarkers. Accurate classification of partitions based on end-point fluorescence intensities is crucial to avoid biased estimators of the concentration of the target molecules. We have evaluated many clustering methods, from general-purpose methods to specific methods for dPCR and flowcytometry, on both simulated and real-life data. Clustering method performance was evaluated by simulating various scenarios. Based on our extensive comparison of clustering methods, we describe the limits of these methods, and formulate guidelines for choosing an appropriate method. In addition, we have developed a novel method for simulating realistic dPCR data. The method is based on a mixture distribution of a Poisson point process and a skew-$t$ distribution, which enables the generation of irregularities of cluster shapes and randomness of partitions between clusters (‘rain’) as commonly observed in dPCR data. Users can fine-tune the model parameters and generate labeled datasets, using their own data as a template. Besides, the database of experimental dPCR data augmented with the labeled simulated data can serve as training and testing data for new clustering methods. The simulation method is available as an R Shiny app.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?
This paper aims to address the problem of accurately classifying partitions in digital PCR (dPCR). Specifically, the paper evaluates the performance of various clustering methods in dual-target experiments using both simulated and real data. Accurate partition classification is crucial to avoid bias in target molecule concentration estimation, especially in multi-target experiments where manual classification may introduce bias and reduce precision. Therefore, this study aims to: 1. **Evaluate existing clustering methods**: Assess the performance of various clustering methods in dPCR partition classification using simulated and real data. 2. **Propose guidelines for selecting appropriate methods**: Describe the limitations of these methods based on extensive comparisons and provide guidelines for choosing suitable methods in different biologically relevant scenarios. 3. **Develop new simulation methods**: Propose a new method to generate realistic dPCR data, allowing users to fine-tune model parameters based on their own data and generate labeled datasets for training and testing new clustering methods. In summary, the goal of this paper is to improve the accuracy and automation of dPCR data analysis, thereby providing more reliable results in fields such as clinical applications, tumor liquid biopsy, and biomarker validation.