Abstract:Introduction: Lung cancer is one of the most frequent neoplasms worldwide with approximately 2.2 million new cases and 1.8 million deaths each year. The expression levels of programmed death ligand-1 (PDL1) demonstrate a complex association with lung cancer. Neuroblastoma is a high-risk malignant tumor and is mainly involved in childhood patients. Identification of new biomarkers for these two diseases can significantly promote their diagnosis and therapy. However, in vivo experiments to discover potential biomarkers are costly and laborious. Consequently, artificial intelligence technologies, especially machine learning methods, provide a powerful avenue to find new biomarkers for various diseases. Methods: We developed a machine learning-based method named LDAenDL to detect potential long noncoding RNA (lncRNA) biomarkers for lung cancer and neuroblastoma using an ensemble of a deep neural network and LightGBM. LDAenDL first computes the Gaussian kernel similarity and functional similarity of lncRNAs and the Gaussian kernel similarity and semantic similarity of diseases to obtain their similar networks. Next, LDAenDL combines a graph convolutional network, graph attention network, and convolutional neural network to learn the biological features of the lncRNAs and diseases based on their similarity networks. Third, these features are concatenated and fed to an ensemble model composed of a deep neural network and LightGBM to find new lncRNA–disease associations (LDAs). Finally, the proposed LDAenDL method is applied to identify possible lncRNA biomarkers associated with lung cancer and neuroblastoma. Results: The experimental results show that LDAenDL computed the best AUCs of 0.8701, 107 0.8953, and 0.9110 under cross-validation on lncRNAs, diseases, and lncRNA-disease pairs on Dataset 1, respectively, and 0.9490, 0.9157, and 0.9708 on Dataset 2, respectively. Furthermore, AUPRs of 0.8903, 0.9061, and 0.9166 under three cross-validations were obtained on Dataset 1, and 0.9582, 0.9122, and 0.9743 on Dataset 2. The results demonstrate that LDAenDL significantly outperformed the other four classical LDA prediction methods (i.e., SDLDA, LDNFSGB, IPCAF, and LDASR). Case studies demonstrate that CCDC26 and IFNG-AS1 may be new biomarkers of lung cancer, SNHG3 may associate with PDL1 for lung cancer, and HOTAIR and BDNF-AS may be potential biomarkers of neuroblastoma. Conclusion: We hope that the proposed LDAenDL method can help the development of targeted therapies for these two diseases.

SLNL: A Novel Method for Gene Selection and Phenotype Classification

LncLSTA: A Versatile Predictor Unveiling Subcellular Localization of Lncrnas Through Long-Short Term Attention

Sparse Logistic Regression with a L 1/2 Penalty for Gene Selection in Cancer Classification

Predicting potential lncRNA biomarkers for lung cancer and neuroblastoma based on an ensemble of a deep neural network and LightGBM

SPLSN: An efficient tool for survival analysis and biomarker selection

An integrated bioinformatics approach to early diagnosis, prognosis and therapeutics of non-small-cell lung cancer

A Novel Genomic Selection Method Combining GBLUP and LASSO

Biomarker Identification Based on the L1 + L1 Penalized Model

A biological network-based regularized artificial neural network model for robust phenotype prediction from gene expression data

Network-based logistic regression integration method for biomarker identification

An NMF-L2,1-Norm Constraint Method for Characteristic Gene Selection

A Novel Probability Model for LncRNA–Disease Association Prediction Based on the Naïve Bayesian Classifier

Natural Learning

Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype Prediction

DeepLGP: a Novel Deep Learning Method for Prioritizing Lncrna Target Genes.

A Novel Multiclass Gene Selection Method based on SVM/MLP Cross Validation

Biomarker Identification and Cancer Classification Based on Microarray Data Using Laplace Naive Bayes Model with Mean Shrinkage

LDNFSGB: prediction of long non-coding rna and disease association using network feature similarity and gradient boosting

A Novel Algorithm for Feature Selection Using Penalized Regression with Applications to Single-Cell RNA Sequencing Data

Integrative Analysis of Prognosis Data on Multiple Cancer Subtypes using Penalization

A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification