Prop3D: A flexible, Python-based platform for machine learning with protein structural properties and biophysical data

Eli J. Draizen,John Readey,Cameron Mura,Philip E. Bourne
DOI: https://doi.org/10.1186/s12859-023-05586-5
IF: 3.307
2024-01-06
BMC Bioinformatics
Abstract:Machine learning (ML) has a rich history in structural bioinformatics, and modern approaches, such as deep learning, are revolutionizing our knowledge of the subtle relationships between biomolecular sequence, structure, function, dynamics and evolution. As with any advance that rests upon statistical learning approaches, the recent progress in biomolecular sciences is enabled by the availability of vast volumes of sufficiently-variable data. To be useful, such data must be well-structured, machine-readable, intelligible and manipulable. These and related requirements pose challenges that become especially acute at the computational scales typical in ML. Furthermore, in structural bioinformatics such data generally relate to protein three-dimensional (3D) structures, which are inherently more complex than sequence-based data. A significant and recurring challenge concerns the creation of large, high-quality, openly-accessible datasets that can be used for specific training and benchmarking tasks in ML pipelines for predictive modeling projects, along with reproducible splits for training and testing.
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology
What problem does this paper attempt to address?