Abstract:A major challenge of precision oncology is the identification and prioritization of suitable treatment options based on molecular biomarkers of the considered tumor. In pursuit of this goal, large cancer cell line panels have successfully been studied to elucidate the relationship between cellular features and treatment response. Due to the high dimensionality of these datasets, machine learning (ML) is commonly used for their analysis. However, choosing a suitable algorithm and set of input features can be challenging. We performed a comprehensive benchmarking of ML methods and dimension reduction (DR) techniques for predicting drug response metrics. Using the Genomics of Drug Sensitivity in Cancer cell line panel, we trained random forests, neural networks, boosting trees and elastic nets for 179 anti-cancer compounds with feature sets derived from nine DR approaches. We compare the results regarding statistical performance, runtime and interpretability. Additionally, we provide strategies for assessing model performance compared with a simple baseline model and measuring the trade-off between models of different complexity. Lastly, we show that complex ML models benefit from using an optimized DR strategy, and that standard models—even when using considerably fewer features—can still be superior in performance.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is: in precision cancer medicine, how to predict drug sensitivity through machine - learning methods. Specifically, what the researchers are faced with is how to select appropriate algorithms and feature sets from high - dimensional datasets to predict the response of drugs to cancer cell lines. Since these datasets usually contain a large amount of gene expression data and other multi - omics data, machine - learning techniques are required for analysis. However, selecting appropriate machine - learning algorithms and dimension - reduction methods is a challenge. Through comprehensive benchmark tests, this paper evaluates the performance of different machine - learning algorithms and dimension - reduction techniques in drug - sensitivity prediction, in the hope of finding the optimal combination scheme. ### Research Background 1. **Challenges in Precision Oncology**: One of the main goals of precision oncology is to identify and prioritize suitable treatment options based on the molecular biomarkers of tumors. 2. **Large - scale Datasets**: Large cancer cell - line panels (such as GDSC and CCLE) provide multi - omics measurements of multiple cancer types and drug - response indicators, which can be used to study the relationship between cell characteristics and treatment outcomes. 3. **High - Dimensional Data**: Due to the high - dimensional nature of these datasets, machine - learning methods are usually required for analysis. But selecting appropriate algorithms and input feature sets is a challenge. ### Research Objectives - **Evaluate Different Machine - Learning Algorithms and Dimension - Reduction Techniques**: Through benchmark tests, evaluate the performance of machine - learning algorithms such as random forests, neural networks, boosted trees, and elastic nets in drug - sensitivity prediction. - **Selection of Dimension - Reduction Techniques**: Evaluate the effects of dimension - reduction techniques such as principal component analysis (PCA) and autoencoders (Autoencoder). - **Performance Comparison**: Compare the performance of different methods in terms of statistical performance, running time, and interpretability. - **Optimization Strategies**: Provide strategies for evaluating model performance and measure the trade - offs between models of different complexities. ### Methods 1. **Datasets**: Use the gene expression values and drug - response indicators (IC50 values) in the GDSC database. 2. **Model Training**: For 179 anti - cancer compounds, use four machine - learning algorithms and nine dimension - reduction techniques to generate more than 16 million models. 3. **Performance Evaluation**: Determine the best hyperparameters through cross - validation (CV) and evaluate the model performance on the test set. ### Results - **Best Performance**: The elastic net model shows the best performance and the lowest running time on most drugs, while the neural network performs the worst. - **Dimension - Reduction Techniques**: PCA and the heuristic method based on minimum redundancy and maximum relevance (MRMR) are the most effective dimension - reduction techniques. - **Feature Selection**: The feature - selection method considering drug response performs better than the method using only expression values. ### Conclusions - **Selecting Appropriate Algorithms and Dimension - Reduction Methods**: Selecting appropriate machine - learning algorithms and dimension - reduction techniques is crucial for drug - sensitivity prediction. - **Effectiveness of Simple Models**: Even with fewer features, standard models may still outperform complex models. - **Optimization of Complex Models**: Complex prediction models can improve their performance by optimizing dimension - reduction strategies. Through these studies, the authors hope to provide more reliable and efficient solutions for drug - sensitivity prediction in precision oncology.

A comprehensive benchmarking of machine learning algorithms and dimensionality reduction methods for drug sensitivity prediction

Reliable anti-cancer drug sensitivity prediction and prioritization

A systematic evaluation of deep learning methods for the prediction of drug synergy in cancer

SYSTEMATIC ASSESSMENT OF ANALYTICAL METHODS FOR DRUG SENSITIVITY PREDICTION FROM CANCER CELL LINE DATA

How to Predict Effective Drug Combinations - Moving beyond Synergy Scores

DBDNMF: A Dual Branch Deep Neural Matrix Factorization method for drug response prediction

Assessing Reusability of Deep Learning-Based Monotherapy Drug Response Prediction Models Trained with Omics Data

Machine learning model to predict oncologic outcomes for drugs in randomized clinical trials

Precision Anti-Cancer Drug Selection via Neural Ranking

Understanding the Sources of Performance in Deep Drug Response Models Reveals Insights and Improvements

A community effort to assess and improve drug sensitivity prediction algorithms

A systematic assessment of deep learning methods for drug response prediction: from in vitro to clinical applications

Harnessing machine learning potential for personalised drug design and overcoming drug resistance

Dual-Layer Strengthened Collaborative Topic Regression Modeling for Predicting Drug Sensitivity

Precision and recall oncology: combining multiple gene mutations for improved identification of drug-sensitive tumours

Evaluating molecular representations in machine learning models for drug response prediction and interpretability

A Systematic Approach to Featurization for Cancer Drug Sensitivity Predictions with Deep Learning

Learning and actioning general principles of cancer cell drug sensitivity

Deep learning methods for drug response prediction in cancer: Predominant and emerging trends

A Comprehensive Investigation of Active Learning Strategies for Conducting Anti-Cancer Drug Screening

Abstract 5380: Systematic evaluation and comparison of drug response prediction models: a case study of prediction generalization across cell lines datasets