Quality control protocol for the raw sequencing reads of the eastern banjo frog v1

Qiye Li,Qunfei Guo,Yang Zhou,Huishuang Tan,Terry Bertozzi,Yuanzhen Zhu,Ji Li,Stephen Donnellan,Guojie Zhang
DOI: https://doi.org/10.17504/protocols.io.bghvjt66
2020-01-01
Abstract:The raw sequencing data from the 14 libraries (170 bp × 1, 250 bp × 1, 500 bp × 1, 800 bp × 1, 2 kb × 3, 5 kb × 3, 10 kb × 2, and 20 kb × 2) were subjected to strict quality control by SOAPnuke (v1.5.3) prior to downstream analyses. Briefly, for the raw reads from each library, we trimmed the unreliable bases at the head and tail of each read where the per-position GC content was unbalanced or the per-position base quality was low across all reads; we removed the read pairs with adapter contamination, with high proportion of low-quality or unknown (N) bases; we removed duplicate read pairs resulted from polymerase chain reaction (PCR) amplification (i.e. PCR duplicates); and we also removed the overlapping read pairs in all but the 170 bp and 250 bp libraries where the paired reads were expected to be overlapping.
What problem does this paper attempt to address?