Supervised Learning Model Predicts Protein Adsorption to Carbon Nanotubes

Nicholas Ouassil,Rebecca L. Pinals,Jackson Travis Del Bonis-O’Donnell,Jeffrey Wang,Markita P. Landry,Rebecca L Pinals,Jackson Travis Del Bonis-O'Donnell,Jeffrey W Wang,Markita P Landry
DOI: https://doi.org/10.1101/2021.06.19.449132
2021-06-20
Abstract:Engineered nanoparticles are advantageous for numerous biotechnology applications, including biomolecular sensing and delivery. However, testing the compatibility and function of nanotechnologies in biological systems requires a heuristic approach, where unpredictable biofouling via protein corona formation often prevents effective implementation. Moreover, rational design of biomolecule-nanoparticle conjugates requires prior knowledge of such interactions or extensive experimental testing. Toward better applying engineered nanoparticles in biological systems, herein, we develop a random forest classifier (RFC) trained with proteomic mass spectrometry data that identifies proteins that adsorb to nanoparticles, based solely on the protein's amino acid sequence. We model proteins that populate the corona of a single-walled carbon nanotube (SWCNT)-based optical nanosensor and study whether there is a relationship between the protein's amino acid-based properties and the protein's adsorption to SWCNTs. We optimize the classifier and characterize the classifier performance against other models. To evaluate the predictive power of our model, we apply the classifier to rapidly identify proteins with high binding affinity to SWCNTs, followed by experimental validation. We further determine protein features associated with increased likelihood of SWCNT binding: high content of solvent-exposed glycine residues and non-secondary structure-associated amino acids. Conversely, proteins with high content of leucine residues and beta-sheet-associated amino acids are less likely to form the SWCNT protein corona. The classifier presented herein provides a step toward undertaking the otherwise intractable problem of predicting protein-nanoparticle interactions, which is needed for more rapid and effective translation of nanobiotechnologies from in vitro synthesis to in vivo use.
What problem does this paper attempt to address?