piRNA in Machine-Learning-Based Diagnostics of Colorectal Cancer

Sienna Li,Valentina L Kouznetsova,Santosh Kesari,Igor F Tsigelny
DOI: https://doi.org/10.3390/molecules29184311
IF: 4.6
2024-09-11
Molecules
Abstract:Objective biomarkers are crucial for early diagnosis to promote treatment and raise survival rates for diseases. With the smallest non-coding RNAs-piwi-RNAs (piRNAs)-and their transcripts, we sought to identify if these piRNAs could be used as biomarkers for colorectal cancer (CRC). Using previously published data from serum samples of patients with CRC, 13 differently expressed piRNAs were selected as potential biomarkers. With this data, we developed a machine learning (ML) algorithm and created 1020 different piRNA sequence descriptors. With the Naïve Bayes Multinomial classifier, we were able to isolate the 27 most influential sequence descriptors and achieve an accuracy of 96.4%. To test the validity of our model, we used data from piRBase with known associations with CRC that we did not use to train the ML model. We were able to achieve an accuracy of 85.7% with these new independent data. To further validate our model, we also tested data from unrelated diseases, including piRNAs with a correlation to breast cancer and no proven correlation to CRC. The model scored 44.4% on these piRNAs, showing that it can identify a difference between biomarkers of CRC and biomarkers of other diseases. The final results show that our model is an effective tool for diagnosing colorectal cancer. We believe that in the future, this model will prove useful for colorectal cancer and other diseases diagnostics.
What problem does this paper attempt to address?