Chemical Language Models for Molecular Design

Juergen Bajorath
DOI: https://doi.org/10.1002/minf.202300288
IF: 4.05
2023-11-29
Molecular Informatics
Abstract:In drug discovery, chemical language models (CLMs) originating from natural language processing offer new opportunities for molecular design. CLMs have been developed using recurrent neural network (RNN) or transformer architectures. For the predictive performance of RNN‐based encoder‐decoder frameworks and transformers, attention mechanisms play a central role. Among others, emerging application areas for CLMs include constrained generative modeling and the prediction of chemical reactions or drug‐target interactions. Since CLMs are applicable to any compound or target data that can be presented in a sequential format and tokenized, mappings of different types of sequences can be learned. For example, active compounds can be seamlessly predicted from protein sequence motifs. Novel off‐the‐beat‐path applications can also be considered. For example, analogue series from medicinal chemistry can be perceived and represented as chemical sequences and extended with new compounds using CLMs. Herein, methodological features of CLMs and different applications are discussed.
chemistry, medicinal,mathematical & computational biology,computer science, interdisciplinary applications
What problem does this paper attempt to address?