Abstract:The number of precision oncology clinical trials increases dramatically in the era of precision medicine, and locating precision oncology clinical trials can help researchers, physicians and patients learn about the latest cancer treatment options or participate in such trials. However, unstructured and non-standardized genomic variants embedded in narrative clinical trial documents make it difficult to search for precision oncology clinical trials. This study aims to extract and standardize genomic variants automatically for locating precision oncology clinical trials. Patients with genomic variants, including individual variants and category variants that represent a class of individual variants, are inclued or exclued in accordance with eligibility criteria for precision oncology clinical trials. To extract both individual variants and category variants, we designed 5 classes of entities: variation, gene, exon, qualifier, negation, 4 types of relations for composing variants, and 4 types of relations for representing semantics between variants and variants. Further, we developed an information extraction system that had two modules: (1) cascade extraction module based on the pre-trained model BERT, including sentence classification (SC), named entity recognition (NER), and relation classification (RC), and (2) variant normalization module based on rules and dictionaries, including entity normalization (EN), and post-processing (PP). The system was developed and evaluated on eligibility criteria texts of 400 non-small cell lung cancer clinical trials downloaded from ClinicalTrials.gov. The experimental results showed that F1 score of end-to-end extraction is 0.84. The system was further evaluated on additional 50 multi-cancer clinical trial texts and achieved a F1 score of 0.71, which demonstrated the generalizability of our system. In conclusion, we developed an information extraction system for clinical trial genomic variants extraction that is capable of extracting both individual variants and category variants, and experimental results demonstrate that the extracted results have significant potential for locating precision oncology clinical trials.

Biological gene extraction path based on knowledge graph and natural language processing

Identification of Gene Expression Pattern Related to Breast Cancer Survival Using Integrated TCGA Datasets and Genomic Tools.

Evolutionary Mechanism Based Conserved Gene Expression Biclustering Module Analysis for Breast Cancer Genomics

Application of K-means clustering based on artificial intelligence in gene statistics of biological information engineering

DrGaP: a powerful tool for identifying driver genes and pathways in cancer sequencing studies.

Biomedical literature mining: graph kernel-based learning for gene–gene interaction extraction

Breast Cancer Case Identification Based on Deep Learning and Bioinformatics Analysis

A Cancer Gene Module Mining Method Based on Bio-Network of Multi-Omics Gene Groups

Identification of core genes and potential molecular mechanisms in breast cancer using bioinformatics analysis

Identifying Breast Cancer-Related Genes Based on a Novel Computational Framework Involving KEGG Pathways and PPI Network Modularity

Functional and Embedding Feature Analysis for Pan-Cancer Classification

Discovery of Pan-Cancer Related Genes Via Integrative Network Analysis.

Experimental investigation of pulsed entangled photons and photonic quantum channels

Towards key genes identification for breast cancer survival risk with neural network models

Screening and identification of key genes for cervical cancer, ovarian cancer and endometrial cancer by combinational bioinformatic analysis

Biomedical Information Extraction for Disease Gene Prioritization

Exploring Gene-Mediated Mechanisms Behind Shared Phenotypes Across Diverse Diseases Using the clGENE Tool

A feature extraction framework for discovering pan‐cancer driver genes based on multi‐omics data

Chemical-protein Interaction Extraction via Gaussian Probability Distribution and External Biomedical Knowledge

Automatic Extraction of Genomic Variants for Locating Precision Oncology Clinical Trials

Exploring Prognostic Gene Factors in Breast Cancer via Machine Learning