Analysis of Barcode sequence features to find anomalies due to amplification Bias

Chandrima Sarkar,Raamesh Deshpande,Chad Myers
DOI: https://doi.org/10.48550/arXiv.1402.6775
2014-02-27
Computational Engineering, Finance, and Science
Abstract:In this paper we aim at investigating whether barcode sequence features can predict the read count ambiguities caused during PCR based next generation sequencing techniques. The methodologies we used are mutual information based motif discovery and Lasso regression technique using features generated from the barcode sequence. The results indicate that there is a certain degree of correlation between motifs discovered in the sequences and the read counts. Our main contribution in this paper is a thorough investigation of the barcode features that gave us useful information regarding the significance of the sequence features and the sequence containing the discovered motifs in prediction of read counts.
What problem does this paper attempt to address?