Deep learning-driven QSPR models for accurate properties estimation in organic solar cells using extended connectivity fingerprints

Mohammed Elkabous,Anass Karzazi,Yasser Karzazi
DOI: https://doi.org/10.1016/j.commatsci.2024.113146
IF: 3.572
2024-06-03
Computational Materials Science
Abstract:Bulk heterojunction solar cell (BHJ) materials represent a promising avenue for enhancing environmental stability and practicality in solar cell technology. However, the vast array of potential donor and acceptor materials presents a substantial challenge in identifying the most advantageous options. In this specific context, we demonstrate the potential of Machine learning models, specifically one-dimensional convolutional neural networks (1D CNNs), Feedforward neural networks (FNNs), and traditional machine learning random forests (RF) as baseline models, possessing the ability to exploit intensive DFT computations for the fast and accurate estimation of essential properties of organic semiconductor. We construct a Quantitative Structure-Property Relationship (QSPR) model employing Extended Connectivity Fingerprints (ECFPs) as input, aimed at predicting critical parameters including power conversion efficiency (PCE), highest occupied molecular orbital (HOMO) energy, and lowest unoccupied molecular orbital (LUMO) energy. As a means to assess the effectiveness of our proposed model, we utilized the structures of 25,000 organic molecules for both training and testing purposes. The model's accuracy was evaluated using several performance metrics: R 2 , Mean Squared Error (MSE), Mean Absolute Error (MAE), and Pearson correlation (r). When assessing the outcomes of the three machine learning models, it is apparent that all three models exhibit substantial prediction accuracy. In light of these results, the FNN model stands out as the most accurate predictor. It not only demonstrates noteworthy proficiency in forecasting specific properties but also precisely predicts experimental PCE within 0.45 %. Moreover, the designed molecules (M1–M4) indicate a potential for higher predicted PCE values than the reference TPA-R.
materials science, multidisciplinary
What problem does this paper attempt to address?