Earl Grey: a fully automated user-friendly transposable element annotation and analysis pipeline

Tobias Baril,James Galbraith,Alex Hayward
DOI: https://doi.org/10.1101/2022.06.30.498289
2024-02-20
Abstract:Transposable elements (TEs) are major components of eukaryotic genomes and are implicated in a range of evolutionary processes. Yet, TE annotation and characterisation remains challenging, particularly for non-specialists, since existing pipelines are typically complicated to install, run, and extract data from. Current methods of automated TE annotation are also subject to issues that reduce overall quality, particularly: (i) fragmented and overlapping TE annotations, leading to erroneous estimates of TE count and coverage; (ii) repeat models represented by short sections of total TE length, with poor capture of 5’ and 3’ ends. To address these issues, we present Earl Grey, a fully automated TE annotation pipeline designed for user-friendly curation and annotation of TEs in eukaryotic genome assemblies. Using nine simulated genomes and an annotation of , we show that Earl Grey outperforms current widely-used TE annotation methodologies in ameliorating the issues mentioned above, whilst scoring highly in benchmarking for TE annotation and classification, and being robust across genomic contexts. Earl Grey provides a comprehensive and fully automated TE annotation toolkit that provides researchers with paper-ready summary figures and outputs in standard formats compatible with other bioinformatics tools. Earl Grey has a modular format, with great scope for the inclusion of additional modules focussed on further quality control and tailored analyses in future releases.
Bioinformatics
What problem does this paper attempt to address?