Enhancing Bioactive Compound Classification through the Synergy of Fourier-Transform Infrared Spectroscopy and Advanced Machine Learning Methods

Pedro N. Sampaio,Cecília C. R. Calado
DOI: https://doi.org/10.3390/antibiotics13050428
2024-05-09
Antibiotics
Abstract:Bacterial infections and resistance to antibiotic drugs represent the highest challenges to public health. The search for new and promising compounds with anti-bacterial activity is a very urgent matter. To promote the development of platforms enabling the discovery of compounds with anti-bacterial activity, Fourier-Transform Mid-Infrared (FT-MIR) spectroscopy coupled with machine learning algorithms was used to predict the impact of compounds extracted from Cynara cardunculus against Escherichia coli. According to the plant tissues (seeds, dry and fresh leaves, and flowers) and the solvents used (ethanol, methanol, acetone, ethyl acetate, and water), compounds with different compositions concerning the phenol content and antioxidant and antimicrobial activities were obtained. A principal component analysis of the spectra allowed us to discriminate compounds that inhibited E. coli growth according to the conventional assay. The supervised classification models enabled the prediction of the compounds' impact on E. coli growth, showing the following values for accuracy: 94% for partial least squares-discriminant analysis; 89% for support vector machine; 72% for k-nearest neighbors; and 100% for a backpropagation network. According to the results, the integration of FT-MIR spectroscopy with machine learning presents a high potential to promote the discovery of new compounds with antibacterial activity, thereby streamlining the drug exploratory process.
pharmacology & pharmacy,infectious diseases
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to develop fast and cost - effective systems for discovering bioactive compounds with antibacterial activity, while these compounds are non - cytotoxic to the human host. Specifically, the research aims to screen and discover new antibacterial active molecules by combining Fourier - transform mid - infrared spectroscopy (FT - MIR) and machine - learning methods. The research objects are compounds extracted from different parts (seeds, dried and fresh leaves, flowers) of Cynara cardunculus using different solvents, and the effects of these compounds on Escherichia coli are evaluated. ### Research Background With the increase in antibiotic resistance, it has become very urgent to develop fast and cost - effective systems to discover bioactive compounds with antibacterial activity. Antibiotic - resistant bacteria are a severe and worrying problem in modern medicine, not only because the development of new antibiotics is slow, but also because the spread of multi - drug - resistant determinants is increasingly intensifying. Plant extracts are an important source of bioactive compounds and are crucial for the development and synthesis of many drugs, including antibiotics. For example, Cynara cardunculus is widely used in traditional medicine. Its rhizome extract has antioxidant and antibacterial activities, while its leaves have diuretic, choleretic, and hepatoprotective effects, as well as in - vitro anti - breast - cancer and anti - cervical - cancer proliferation potential. ### Research Methods The research adopted the method of combining Fourier - transform mid - infrared spectroscopy (FT - MIR) with machine - learning algorithms to predict the effects of compounds extracted from different parts of Cynara cardunculus on Escherichia coli. The specific steps include: 1. **Compound Extraction**: Compounds are extracted from the seeds, dried and fresh leaves, and flowers of Cynara cardunculus using different solvents (ethanol, methanol, acetone, ethyl acetate, and water). 2. **Spectral Analysis**: Use FT - MIR spectral technology to obtain spectral data of Escherichia coli cells. 3. **Data Pre - processing**: Pre - process the spectral data, including methods such as multiple scattering correction (MSC), standard normal variate transformation (SNV), first - order derivative, and second - order derivative. 4. **Principal Component Analysis (PCA)**: Analyze the spectral data through PCA to distinguish samples with antibacterial activity from those with low antibacterial activity. 5. **Machine - learning Classification**: Use multiple supervised classification algorithms (such as partial least - squares discriminant analysis (PLS - DA), support vector machine (SVM), k - nearest neighbor (k - NN), and back - propagation network (BPN)) to model the spectral data and predict the effects of compounds on the growth of Escherichia coli. ### Main Results - **Antibacterial Activity of Extracts**: Compounds extracted from seeds (extracted with water and ethanol), leaves (extracted with methanol), dried leaves (extracted with water), and flowers (extracted with ethanol and methanol) showed significant antibacterial activity. - **Spectral Pre - processing**: MSC and second - order derivative are the most effective pre - processing methods and can effectively distinguish samples with antibacterial activity. - **PCA Analysis**: The PCA score plot shows that samples with antibacterial activity and those with low antibacterial activity can be clearly clustered. - **Machine - learning Classification**: The PLS - DA model showed the highest accuracy (94%) in predicting the effects of compounds on the growth of Escherichia coli, followed by SVM (89%), k - NN (72%), and BPN (100%). ### Conclusion The research results show that the combination of FT - MIR spectral technology and machine - learning methods has great potential in screening and discovering new antibacterial active compounds, which can significantly simplify the drug discovery process. This method not only improves the screening efficiency but also reduces the cost, providing a new tool for dealing with the problem of antibiotic resistance.