Lite-SeqCNN: A Light-Weight Deep CNN Architecture for Protein Function Prediction

Vikash Kumar,Akshay Deepak,Ashish Ranjan,Aravind Prakash
DOI: https://doi.org/10.1109/TCBB.2023.3240169
Abstract:The short-and-long range interactions amongst amino-acids in a protein sequence are primarily responsible for the function performed by the protein. Recently convolutional neural network (CNN)s have produced promising results on sequential data including those of NLP tasks and protein sequences. However, CNN's strength primarily lies at capturing short range interactions and are not so good at long range interactions. On the other hand, dilated CNNs are good at capturing both short-and-long range interactions because of varied - short-and-long - receptive fields. Further, CNNs are quite light-weight in terms of trainable parameters, whereas most existing deep learning solutions for protein function prediction (PFP) are based on multi-modality and are rather complex and heavily parametrized. In this paper, we propose a (sub-sequence + dilated-CNNs)-based simple, light-weight and sequence-only PFP framework Lite-SeqCNN. By varying dilation-rates, Lite-SeqCNN efficiently captures both short-and-long range interactions and has (0.50-0.75 times) fewer trainable parameters than its contemporary deep learning models. Further, Lite-SeqCNN + is an ensemble of three Lite-SeqCNNs developed with different segment-sizes that produces even better results compared to the individual models. The proposed architecture produced improvements upto 5% over state-of-the-art approaches Global-ProtEnc Plus, DeepGOPlus, and GOLabeler on three different prominent datasets curated from the UniProt database.
What problem does this paper attempt to address?