Integration of diverse bioactivity data into the Chemical Checker compound universe

Arnau Comajuncosa-Creus,Martino Bertoni,Miquel Duran-Frigola,Adria Fernandez-Torras,Oriol Guitart-Pla,Nils Kurzawa,Martina Locatelli,Yasmmin Martins,Elena Pareja-Lorente,Gema Rojans-Granado,Nicolas Soler,Eva Viesi,Patrick Aloy
DOI: https://doi.org/10.1101/2024.12.04.626832
2024-12-07
Abstract:Chemical signatures encode the physicochemical and structural properties of small molecules into numerical descriptors, forming the basis for chemical comparisons and search algorithms. The increasing availability of bioactivity data has improved compound representations to include biological effects, although bioactivity descriptors are often limited to a few well-documented molecules. To address this issue, we implemented a collection of deep neural networks able to leverage the experimentally determined bioactivity data associated to small molecules and infer the missing bioactivity signatures for any compound of interest. However, unlike static chemical descriptors, these bioactivity signatures dynamically evolve with new data and processing strategies. Here, we present a computational protocol to modify or generate novel bioactivity spaces and signatures, describing the main steps needed to leverage diverse bioactivity data with the current knowledge, as catalogued in the Chemical Checker (CC), using the predefined data curation pipeline. We illustrate the functioning of the protocol through four specific examples, including the incorporation of new compounds to an already existing bioactivity space, a change in the data pre-processing without altering the underlying experimental data, and the creation of two novel bioactivity spaces from scratch, which are completed in under 9 hours using GPU computing. Overall, this protocol offers a guideline for installing, testing and running the CC data integration approach on user-provided data, with the aim of extending the annotation presented for a limited number of small molecules to a larger chemical landscape.
Bioinformatics
What problem does this paper attempt to address?