PySmash: Python package and individual executable program for representative substructure generation and application

Zi-Yi Yang,Zhi-Jiang Yang,Yue Zhao,Ming-Zhu Yin,Ai-Ping Lu,Xiang Chen,Shao Liu,Ting-Jun Hou,Dong-Sheng Cao
DOI: https://doi.org/10.1093/bib/bbab017
IF: 9.5
2021-03-12
Briefings in Bioinformatics
Abstract:Abstract Background Substructure screening is widely applied to evaluate the molecular potency and ADMET properties of compounds in drug discovery pipelines, and it can also be used to interpret QSAR models for the design of new compounds with desirable physicochemical and biological properties. With the continuous accumulation of more experimental data, data-driven computational systems which can derive representative substructures from large chemical libraries attract more attention. Therefore, the development of an integrated and convenient tool to generate and implement representative substructures is urgently needed. Results In this study, PySmash, a user-friendly and powerful tool to generate different types of representative substructures, was developed. The current version of PySmash provides both a Python package and an individual executable program, which achieves ease of operation and pipeline integration. Three types of substructure generation algorithms, including circular, path-based and functional group-based algorithms, are provided. Users can conveniently customize their own requirements for substructure size, accuracy and coverage, statistical significance and parallel computation during execution. Besides, PySmash provides the function for external data screening. Conclusion PySmash, a user-friendly and integrated tool for the automatic generation and implementation of representative substructures, is presented. Three screening examples, including toxicophore derivation, privileged motif detection and the integration of substructures with machine learning (ML) models, are provided to illustrate the utility of PySmash in safety profile evaluation, therapeutic activity exploration and molecular optimization, respectively. Its executable program and Python package are available at https://github.com/kotori-y/pySmash.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?