PUNCH2: explore the strategy for Intrinsically Disordered Protein predictor

Di Meng,Gianluca Pollastri
DOI: https://doi.org/10.1101/2024.10.03.616458
2024-10-03
Abstract:Intrinsically disordered proteins (IDPs) lack stable three-dimensional structures, which poses significant challenges for their computational prediction. This study introduces PUNCH2, a novel approach for predicting intrinsically disordered regions (IDRs) in proteins. We address key issues such as the scarcity of comprehensive IDR databases, effective feature extraction, and robust model architecture. By integrating sequences from experimental PDB datasets and fully disordered DisProt sequences for training, PUNCH2 achieves superior prediction accuracy. Various sequence embeddings, including One-Hot, MSA-based, and PLM-based methods, were evaluated, with ProtTrans-based embeddings showing the best performance. The optimal model architecture features multiple convolutional layers, enhancing predictive confidence. PUNCH2 and its lighter variant, PUNCH2-light, demonstrate high efficiency and accuracy in benchmarking against top predictors on the CAID2 dataset, offering promising tools for advancing IDP research and understanding.
Bioinformatics
What problem does this paper attempt to address?