Fundamental Limits of Pooled-DNA Sequencing

Amir Najafi,Damoun Nashta-ali,Seyed Abolfazl Motahari,Mehrdad Khani,Babak H. Khalaj,Hamid R. Rabiee
DOI: https://doi.org/10.48550/arXiv.1604.04735
2016-04-19
Abstract:In this paper, fundamental limits in sequencing of a set of closely related DNA molecules are addressed. This problem is called pooled-DNA sequencing which encompasses many interesting problems such as haplotype phasing, metageomics, and conventional pooled-DNA sequencing in the absence of tagging. From an information theoretic point of view, we have proposed fundamental limits on the number and length of DNA reads in order to achieve a reliable assembly of all the pooled DNA sequences. In particular, pooled-DNA sequencing from both noiseless and noisy reads are investigated in this paper. In the noiseless case, necessary and sufficient conditions on perfect assembly are derived. Moreover, asymptotically tight lower and upper bounds on the error probability of correct assembly are obtained under a biologically plausible probabilistic model. For the noisy case, we have proposed two novel DNA read denoising methods, as well as corresponding upper bounds on assembly error probabilities. It has been shown that, under mild circumstances, the performance of the reliable assembly converges to that of the noiseless regime when, for a given read length, the number of DNA reads is sufficiently large. Interestingly, the emergence of long DNA read technologies in recent years envisions the applicability of our results in real-world applications.
Information Theory
What problem does this paper attempt to address?