An Efficient Consolidation of Word Embedding and Deep Learning Techniques for Classifying Anticancer Peptides: FastText+BiLSTM

Onur Karakaya,Zeynep Hilal Kilimci
2023-09-21
Abstract:Anticancer peptides (ACPs) are a group of peptides that exhibite antineoplastic properties. The utilization of ACPs in cancer prevention can present a viable substitute for conventional cancer therapeutics, as they possess a higher degree of selectivity and safety. Recent scientific advancements generate an interest in peptide-based therapies which offer the advantage of efficiently treating intended cells without negatively impacting normal cells. However, as the number of peptide sequences continues to increase rapidly, developing a reliable and precise prediction model becomes a challenging task. In this work, our motivation is to advance an efficient model for categorizing anticancer peptides employing the consolidation of word embedding and deep learning models. First, Word2Vec and FastText are evaluated as word embedding techniques for the purpose of extracting peptide sequences. Then, the output of word embedding models are fed into deep learning approaches CNN, LSTM, BiLSTM. To demonstrate the contribution of proposed framework, extensive experiments are carried on widely-used datasets in the literature, ACPs250 and Independent. Experiment results show the usage of proposed model enhances classification accuracy when compared to the state-of-the-art studies. The proposed combination, FastText+BiLSTM, exhibits 92.50% of accuracy for ACPs250 dataset, and 96.15% of accuracy for Independent dataset, thence determining new state-of-the-art.
Machine Learning,Artificial Intelligence,Computational Engineering, Finance, and Science
What problem does this paper attempt to address?
The aim of this paper is to develop an efficient model to classify Anticancer Peptides (ACPs). Traditional cancer treatments such as chemotherapy and radiotherapy, although effective, have issues like poor selectivity, significant side effects, and the tendency of tumor cells to develop resistance. Therefore, finding more effective treatment methods has become an urgent task. ACPs, as a promising alternative to traditional cancer therapies, have garnered attention because they can selectively target cancer cells without affecting normal cells. However, with the rapid increase in peptide sequences, developing reliable and accurate prediction models has become extremely challenging. Currently, the identification and classification of ACPs mainly rely on time-consuming and labor-intensive experimental techniques, such as high-throughput screening and mass spectrometry. To address these issues, researchers have begun exploring computational methods to predict and classify ACPs. In this paper, the authors propose an efficient framework combining word embedding techniques and deep learning models to classify ACPs. First, they evaluate Word2Vec and FastText as word embedding techniques to extract peptide sequence features; then, the output of the word embedding models is fed into deep learning architectures such as CNN, LSTM, and BiLSTM. Experimental results show that the proposed model performs excellently on commonly used ACP datasets, achieving classification accuracies of 92.50% and 96.15% on the ACPs250 and independent datasets, respectively, thus establishing a new research frontier.