Genome Re-Sequencing and Reannotation of the Escherichia Coli ER2566 Strain and Transcriptome Sequencing under Overexpression Conditions.
Lizhi Zhou,Hai Yu,Kaihang Wang,Tingting Chen,Yue Ma,Yang Huang,Jiajia Li,Liqin Liu,Yuqian Li,Zhibo Kong,Qingbing Zheng,Yingbin Wang,Ying Gu,Ningshao Xia,Shaowei Li
DOI: https://doi.org/10.1186/s12864-020-06818-1
IF: 4.547
2020-01-01
BMC Genomics
Abstract:Abstract Background The Escherichia coli ER2566 strain (NC_CP014268.2) was developed as a BL21 (DE3) derivative strain and has been widely used in recombinant protein expression. However, like many other current RefSeq annotations, the annotation of the ER2566 strain is incomplete, with missing gene names and miscellaneous RNAs, as well as uncorrected annotations of some pseudogenes. Here, we performed a systematic reannotation of the ER2566 genome by combining multiple annotation tools with manual revision to provide a comprehensive understanding of the E. coli ER2566 strain, and used high-throughput sequencing to explore how the strain adapts under external pressure. Results The reannotation included noteworthy corrections to all protein-coding genes, led to the exclusion of 120 hypothetical genes or pseudogenes, and resulted in the addition of 65 coding sequences and 230 miscellaneous noncoding RNAs and 2 tRNAs. In addition, we further manually examined all 194 pseudogenes in the Ref-seq annotation and directly identified 144 (74%) as coding genes. The remaining pseudogenes without explicit function were removed. We then used whole-genome sequencing and high-throughput RNA sequencing to assess mutational adaptations under consecutive subculture or overexpression burden. Whereas no mutations were detected in response to consecutive subculture, overexpression of the human papillomavirus 16 type capsid led to the identification of a mutation (position 1,094,824 within the 3’ non-coding region) positioned 19-bp away from the lac I gene in the transcribed RNA, which was not detected at the genomic level by Sanger sequencing.Conclusion The ER2566 strain is used by both the general scientific community and the biotechnology industry. Reannotation of the E.coli ER2566 strain not only improved the RefSeq data but uncovered a key site that might involve in the transcription and translation of genes encoding the lactose operon repressor. We propose that our pipeline might offer a universal method for the reannotation of other bacterial genomes with high speed and accuracy. This study may facilitate a better understanding of gene function for the ER2566 strain under external burden and provide more clues to engineer bacteria for biotechnological applications.