TEtrimmer: a novel tool to automate the manual curation of transposable elements

Jiangzhao Qian,Hang Xue,Shujun Ou,Jessica M Storer,Lisa Fuertauer,Mary C Wildermuth,Stefan Kusch,Ralph Panstruga
DOI: https://doi.org/10.1101/2024.06.27.600963
2024-07-02
Abstract:Transposable elements (TEs) are repetitive DNA sequences capable of moving within genomes. Accurate annotation and classification of TEs is crucial but challenging due to their sequence diversity and often fragmented occurrence. We present TEtrimmer, a novel tool to automate manual TE curation. TEtrimmer integrates multiple sequence alignment (MSA) clustering, MSA sequence extension, MSA cleaning, TE boundary definition, and TE classification, and provides report plots and a graphical user interface (GUI) application to inspect and improve results. Benchmarked on the genomes of six organisms from various kingdoms of life, TEtrimmer consistently improved the identification of intact TEs compared to established tools.
Bioinformatics
What problem does this paper attempt to address?