Abstract:Introduction: Artificial intelligence (AI) is exhibiting tremendous potential to reduce the massive costs and long timescales of drug discovery. There are however important challenges currently limiting the impact and scope of AI models. Areas covered: In this perspective, the authors discuss a range of data issues (bias, inconsistency, skewness, irrelevance, small size, high dimensionality), how they challenge AI models, and which issue-specific mitigations have been effective. Next, they point out the challenges faced by uncertainty quantification techniques aimed at enhancing and trusting the predictions from these AI models. They also discuss how conceptual errors, unrealistic benchmarks and performance misestimation can confound the evaluation of models and thus their development. Lastly, the authors explain how human bias, whether from AI experts or drug discovery experts, constitutes another challenge that can be alleviated by gaining more prospective experience. Expert opinion: AI models are often developed to excel on retrospective benchmarks unlikely to anticipate their prospective performance. As a result, only a few of these models are ever reported to have prospective value (e.g. by discovering potent and innovative drug leads for a therapeutic target). The authors have discussed what can go wrong in practice with AI for drug discovery. We hope that this will help inform the decisions of editors, funders investors and researchers working in this area.

What problem does this paper attempt to address?

The problems that this paper attempts to solve are key issues such as data challenges, uncertainty quantification, model evaluation, and researcher bias in the application of artificial intelligence (AI) in drug discovery. Specifically: 1. **Data problems**: - **Biased data**: Even under ideal circumstances, instances in a dataset may sample the label distribution unevenly, resulting in a trained model with poor generalization ability to unseen regions. - **Inconsistent data**: Data generated by different laboratories may lead to poor model generalization ability due to differences in device calibration or sample preparation methods. - **Skewed data**: Especially in early - stage drug discovery, the frequency of active molecules (the minority class) is much lower than that of inactive molecules (the majority class), resulting in an unbalanced dataset. - **Irrelevant data**: When selecting features, features that are irrelevant to prediction may be included, affecting model performance. - **Small - sized data**: Insufficient sample quantity makes it difficult for supervised learning algorithms to accurately predict other samples. - **High - dimensional data**: In biomarker discovery, the number of features far exceeds the number of samples, increasing the difficulty of model generalization. 2. **Uncertainty quantification**: - Quantifying the uncertainty of prediction, that is, the reliability of prediction, is crucial for decision - making. For example, using Gaussian processes (GP) or conformal prediction (CP) to estimate the uncertainty of prediction can help screen out more reliable molecules. 3. **Model evaluation**: - **Conceptual errors**: For example, the concept of over - fitting is often misinterpreted. It is considered that if a model performs well on the training set but poorly on the test set, then the model is not trustworthy. - **Performance misestimation**: Using inappropriate metrics or benchmarks to evaluate model performance may lead to performance misestimation. For example, ROC - AUC is not a suitable metric in highly unbalanced datasets. - **Unrealistic benchmarks**: Many benchmark tests are too idealized to truly reflect the performance of a model in practical applications. 4. **Researchers' bias**: - **Bias of AI experts**: AI experts tend to think that any problem can be solved by the correct learning algorithm, but lack an understanding of domain knowledge, leading to over - hyping of AI applications. - **Bias of drug discovery experts**: Experts in the field of drug discovery are often skeptical about AI applications, fearing that their work will become unimportant or redundant. This defensive attitude will actually exacerbate future uncertainties. By discussing these issues, the author hopes to provide guidance for editors, funders, investors, and researchers to better understand and address the challenges of AI in drug discovery.

Data-centric challenges with the application and adoption of artificial intelligence for drug discovery

Data-centric challenges with the application and adoption of artificial intelligence for drug discovery

Piquing artificial intelligence towards drug discovery: Tools, techniques, and applications

Artificial intelligence revolutionizing drug development: Exploring opportunities and challenges

Artificial intelligence in drug discovery: recent advances and future perspectives

Artificial Intelligence for Drug Discovery: Resources, Methods, and Applications

Artificial Intelligence for Drug Discovery: Are We There Yet?

Comparative pharmacology of N-substituted tertiary and quaternary amino esters of acetic and propionic acid.

AI approaches for the discovery and validation of drug targets

Artificial Intelligence in Pharmaceutical Sciences

Exploring the Artificial Intelligence and Machine Learning Models in the Context of Drug Design Difficulties and Future Potential for the Pharmaceutical Sectors

The Role of Artificial Intelligence in Drug Discovery and Development

Opposite effects of thiocyanate on tyrosine iodination and thyroid hormone synthesis.

Artificial intelligence in drug development: present status and future prospects

[Estimation of nitrogen concentration in cotton leaf based on canopy reflectance spectra].

The challenges of generalizability in artificial intelligence for ADME/Tox endpoint and activity prediction

The recent advances in the approach of artificial intelligence (AI) towards drug discovery

The potential applications of artificial intelligence in drug discovery and development

Tribulations and future opportunities for artificial intelligence in precision medicine

A review on artificial intelligence and machine learning used in pharmaceutical research

Employing Artificial Intelligence Methods in Drug Development: A New Era in Medicine