vScreenML v2.0: Improved Machine Learning Classification for Reducing False Positives in Structure-Based Virtual Screening

Grigorii V Andrianov,Emeline Haroldsen,John Karanicolas
DOI: https://doi.org/10.1101/2024.10.08.617248
2024-10-12
Abstract:Enthusiastic adoption of make-on-demand chemical libraries for virtual screening has highlighted the need for methods that deliver improved hit-finding discovery rates. Traditional virtual screening methods are often inaccurate, with most compounds nominated in a virtual screen not engaging the intended target protein to any detectable extent. Emerging machine learning approaches have made significant progress in this regard, including our previously described tool vScreenML. Broad adoption of vScreenML was hindered by its challenging usability and dependencies on certain obsolete or proprietary software packages. Here, we introduce vScreenML 2.0 (https://github.com/gandrianov/vScreenML2) to address each of these limitations with a streamlined Python implementation. Through careful benchmarks, we show that vScreenML 2.0 outperforms other widely-used tools for virtual screening hit discovery.
Bioinformatics
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve The paper aims to address the issue of high false positive rates in virtual screening processes. Specifically: 1. **Limitations of Traditional Virtual Screening Methods**: - Traditional virtual screening methods are often not accurate enough, and most compounds nominated as candidates in virtual screening do not effectively bind to the target protein. - This leads to a significant waste of time and resources, especially when testing these compounds. 2. **Shortcomings of Existing Tools**: - Researchers previously developed a machine learning classifier tool called vScreenML, but its widespread adoption was hindered by its complex installation process and reliance on outdated or proprietary software packages. 3. **Improvements in vScreenML 2.0**: - To overcome these limitations, researchers introduced vScreenML 2.0, which improves usability by simplifying the Python implementation and avoiding reliance on outdated or expensive software packages. - The new version is not only more user-friendly but also incorporates new features to enhance the ability to distinguish between active and inactive compounds. Through a series of benchmark tests, vScreenML 2.0 demonstrated superior performance compared to other widely used virtual screening tools, thereby more effectively reducing false positive results.