Accelerating discoveries in medicine using distributed vector representations of words

Matheus V.V. Berto,Breno L. Freitas,Carolina Scarton,João A. Machado-Neto,Tiago A. Almeida
DOI: https://doi.org/10.1016/j.eswa.2024.123566
IF: 8.5
2024-04-05
Expert Systems with Applications
Abstract:Over the years, several neural network architectures have been proposed to process and represent texts using dense vectors (known as word embeddings): mathematical representations that encode the meaning of words or phrases. Word embeddings can be computed by many different algorithms, usually trained on large amounts of textual data aiming to capture semantic relationships between words. These embeddings revolutionized many Natural Language Processing applications, enabling more accurate and nuanced language understanding. Recently, it was demonstrated that it is possible to employ word embeddings to uncover latent knowledge, i.e., information that may be implicit in a set of texts and that would hardly be perceptible to humans. In this context, this study extends such strategy by combining different unsupervised models to accelerate discoveries in medicine. Our word embeddings were trained on a large corpus of medical papers related to Acute Myeloid Leukemia, a highly malignant form of cancer. Our study shows that established therapies could have been developed before their first proposal due to treatment testing notifications issued by our system up to 11 years in advance. The results show the potential of uncovering latent knowledge from the biomedical field to empower faster and more efficient drug testing for medical discoveries.
computer science, artificial intelligence,engineering, electrical & electronic,operations research & management science
What problem does this paper attempt to address?