Abstract:Background: In-silico quantitative structure-activity relationship (QSAR) models based tools are widely used to screen huge databases of compounds in order to determine the biological properties of chemical molecules based on their chemical structure. With the passage of time, the exponentially growing amount of synthesized and known chemicals data demands computationally efficient automated QSAR modeling tools, available to researchers that may lack extensive knowledge of machine learning modeling. Thus, a fully automated and advanced modeling platform can be an important addition to the QSAR community. Results: In the presented workflow the process from data preparation to model building and validation has been completely automated. The most critical modeling tasks (data curation, data set characteristics evaluation, variable selection and validation) that largely influence the performance of QSAR models were focused. It is also included the ability to quickly evaluate the feasibility of a given data set to be modeled. The developed framework is tested on data sets of thirty different problems. The best-optimized feature selection methodology in the developed workflow is able to remove 62-99% of all redundant data. On average, about 19% of the prediction error was reduced by using feature selection producing an increase of 49% in the percentage of variance explained (PVE) compared to models without feature selection. Selecting only the models with a modelability score above 0.6, average PVE scores were 0.71. A strong correlation was verified between the modelability scores and the PVE of the models produced with variable selection. Conclusions: We developed an extendable and highly customizable fully automated QSAR modeling framework. This designed workflow does not require any advanced parameterization nor depends on users decisions or expertise in machine learning/programming. With just a given target or problem, the workflow follows an unbiased standard protocol to develop reliable QSAR models by directly accessing online manually curated databases or by using private data sets. The other distinctive features of the workflow include prior estimation of data modelability to avoid time-consuming modeling trials for non modelable data sets, an efficient variable selection procedure and the facility of output availability at each modeling task for the diverse application and reproduction of historical predictions. The results reached on a selection of thirty QSAR problems suggest that the approach is capable of building reliable models even for challenging problems.

Autoencoder-based Dimensionality Reduction for QSAR Modeling

Optical Receiver with Voltage-Controlled Transimpedance in BiCMOS Technology

Quantitative Structure–activity Relationship: Promising Advances in Drug Discovery Platforms

Comprehensive ensemble in QSAR prediction for drug discovery

An automated framework for QSAR model building

On the Virtues of Automated QSAR The New Kid on the Block

Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug Discovery

Ensemble Feature Selection: Consistent Descriptor Subsets for Multiple QSAR Models

Optimal Piecewise Linear Regression Algorithm for QSAR Modelling

Predictive QSAR Models for Polyspecific Drug Targets: the Importance of Feature Selection

From chemical similarity measures to an unconventional modeling framework: The application of c-RASAR along with dimensionality reduction techniques in a representative hepatotoxicity dataset

Pre-processing in AI based Prediction of QSARs

Uni-QSAR: an Auto-ML Tool for Molecular Property Prediction

Visceral venous aneurysms: clinical presentation, natural history and their management: a systematic review.

A Deep Learning-Based Chemical System for QSAR Prediction

Uric acid, xanthine and hypoxanthine in the cerebrospinal fluid.

Corporate Governance, Board of Directors, and Firm Performance

Unlocking the Potential of High-Quality Dopamine Transporter Pharmacological Data: Advancing Robust Machine Learning-Based QSAR Modeling

Novel Consensus Architecture To Improve Performance of Large-Scale Multitask Deep Learning QSAR Models

Do we need different machine learning algorithms for QSAR modeling? A comprehensive assessment of 16 machine learning algorithms on 14 QSAR data sets