Exploring combinations in chemoinformatics: Toward a multidisciplinary view

Jose L. Medina-Franco,Edgar López-López,Johny R. Rodríguez-Pérez,Héctor F. Cortés-Hernández,Samuel Homberg
DOI: https://doi.org/10.26434/chemrxiv-2024-fcwnc
2024-08-02
Abstract:In Chemoinformatics, as in many other computational-related disciplines, it is a common practice to identify the “single best” approach or methodology, for instance, identify the best fingerprint representation, the best single virtual screening approach or protocol, the optimal representation of the chemical space, the best predictive model, to name a few. In molecular modeling, a typical example is finding the best docking program. However, it is also known that each approach has its advantages and limitations. There are examples of benchmark studies comparing different approaches to find the most appropriate solution, and it is common to find that there are no single best programs in such studies. Yet, searching for the “best” methods is still common. The main goal of this work is to survey hybrid methodologies typically used in Chemoinformatics. The list of approaches is not exhaustive, but it aims to cover several representative applications. One of the major outcomes of the survey is that, for various purposes, individual methods do not perform as well as the combination of approaches because single methods have inherent limitations with advantages and disadvantages.
Chemistry
What problem does this paper attempt to address?
The paper primarily explores the combined application of different methods and techniques in the field of cheminformatics, aiming to address the limitations of single methods when dealing with complex problems. Specifically, the paper attempts to solve the following key issues: 1. **Limitations of Single Methods**: In cheminformatics and other related computational disciplines, there is often a search for the "best" method or technique (e.g., the best fingerprint representation, virtual screening methods, etc.). However, each method has its advantages and limitations, and no single method can perform optimally in all situations. 2. **Advantages of Method Combinations**: Given the limitations of single methods, the paper emphasizes the value of combining multiple methods. By integrating different methods and techniques, complex problems can be better addressed, and the accuracy of predictions and analyses can be improved. 3. **Multidisciplinary Perspective in Cheminformatics**: The paper introduces how cheminformatics, as an emerging discipline, integrates knowledge and techniques from traditional disciplines, and how this multidisciplinary perspective fosters the emergence and development of new research areas. 4. **Improvement of Molecular Representation Methods**: The evolution of molecular representation methods is discussed, particularly the combination of different methods and techniques to generate more effective molecular descriptors. For example, the Extended Connectivity Fingerprint (ECFP) is a result of combining the Morgan algorithm with hashing fingerprint techniques. 5. **Optimization of Property Prediction**: For property prediction in drug discovery and other chemical applications, the paper suggests that consensus predictions and ensemble models often perform better than single predictors. Especially when predicting the absorption, distribution, metabolism, excretion, and toxicity (ADMETox) properties of compounds, multiple data types need to be considered comprehensively. 6. **In-depth Exploration of Structure-Activity Relationships**: The paper also explores how to use multiple structure representation methods to improve the modeling of structure-activity relationships (SAR) and structure-inactivity relationships (SIR), thereby enhancing the detection of activity cliffs. 7. **Development of Virtual Screening Strategies**: With the rapid expansion of chemical libraries, the paper discusses how to conduct efficient virtual screening by combining multiple computational methods, including the application of similarity-based search techniques and data fusion techniques. In summary, the paper aims to address the limitations of single methods in facing complex chemical problems by exploring the combined use of different methods and techniques in cheminformatics, with the goal of improving prediction accuracy and problem-solving capabilities.