NeuralTE: an accurate approach for Transposable Element superfamily classification with multi-feature fusion

Kang Hu,Minghua Xu,Xin Gao,Jianxin Wang
DOI: https://doi.org/10.1101/2024.01.21.576519
2024-01-01
Abstract:Motivation Classifying Transposable Elements (TEs) at the superfamily level offers deeper insights into species variation and evolution. Recent advancements in third-generation sequencing technologies have made a large number of genomes from non-model species becoming available. However, existing TE classification methods suffer from several limitations, including the necessity to train multiple hierarchical classification models, the incapacity to perform classification at the superfamily level, and deficiencies in both accuracy and robustness. Therefore, there is an urgent need for an accurate TE classification method to improve genome annotation. Results In this study, we develop NeuralTE, a deep learning method designed to classify transposons at the superfamily level. To achieve accurate TE classification, we identify various structural features of transposons, and use different combinations of k-mers for terminal repeats and internal sequences to uncover distinct patterns. Evaluation on all transposons from Repbase shows that NeuralTE outperforms existing deep learning, machine learning, and homology-based methods in classifying TEs. Testing on the transposons from novel species highlights the superior performance of NeuralTE compared to existing methods. We also conduct TE annotation experiments on rice using different classification tools, and the results show that NeuralTE achieves annotations nearly identical to the gold standard, highlighting its robustness and accuracy in classifying transposons. Availability NeuralTE is publicly available at <https://github.com/CSU-KangHu/NeuralTE>. ### Competing Interest Statement The authors have declared no competing interest.
What problem does this paper attempt to address?