Deep learning facilitates precise identification of disease-resistance genes in plants
Zhenya Liu,Xu Wang,Shuo Cao,Tingyue Lei,Yifu Chenzhu,Mengyan Zhang,Zhongqi Liu,Jianzhong Lu,Wenqi Ma,Bingxiong Su,Yiwen Wang,Yongfeng Zhou
DOI: https://doi.org/10.1101/2024.09.26.615248
2024-09-28
Abstract:The identification of plant disease-resistance genes is essential for understanding the plant immune system and accelerating crop breeding with disease resistance. Therefore, there is a pressing need for a method capable of accurately and comprehensively annotating resistance genes at a genome-wide scale. In this study, we propose Evolutionary Scale Modeling for LRR (ESM-LRR), a novel approach based on the deep protein language model to accurately identify LRR domains which are substantially variable structures in disease-resistance proteins. ESM-LRR achieved its highest F1 score of 0.80 on a test set using the 90% identity as the matching threshold. Building upon ESM-LRR, we developed the Plant Disease-Resistance Gene Predictor (R-Predictor), a framework designed to simultaneously annotate 15 diverse domain topologies, extensively covering recently characterised resistance genes across the entire genome. R-Predictor integrates four modules, each employing superior methods that outperform existing methods (F1 score of 0.89 for RLKs and 0.88 for NLRs), demonstrating its high accuracy and practicality in annotating disease-resistance genes (R genes). R-Predictor were also applied to identify R genes in rice, tomato, and grapevine. Furthermore, AlphaFold3 was employed to screen interactions between 1,116 protein pairs involving three NLRs identified exclusively by R-Predictor and 372 literature-reported plant-pathogen effectors, resulting in the identification of 15 putative NLR-effector complexes. Overall, this study presents a novel approach for advancing our understanding of plant immune mechanisms.
Bioinformatics