STARRPeaker: uniform processing and accurate identification of STARR-seq active regions

Donghoon Lee,Manman Shi,Jennifer Moran,Martha Wall,Jing Zhang,Jason Liu,Dominic Fitzgerald,Yasuhiro Kyono,Lijia Ma,Kevin P White,Mark Gerstein
DOI: https://doi.org/10.1186/s13059-020-02194-x
2020-12-08
Abstract:STARR-seq technology has employed progressively more complex genomic libraries and increased sequencing depths. An issue with the increased complexity and depth is that the coverage in STARR-seq experiments is non-uniform, overdispersed, and often confounded by sequencing biases, such as GC content. Furthermore, STARR-seq readout is confounded by RNA secondary structure and thermodynamic stability. To address these potential confounders, we developed a negative binomial regression framework for uniformly processing STARR-seq data, called STARRPeaker. Moreover, to aid our effort, we generated whole-genome STARR-seq data from the HepG2 and K562 human cell lines and applied STARRPeaker to comprehensively and unbiasedly call enhancers in them.
What problem does this paper attempt to address?