TADF-GEN: An Iterative and Self-improving Method with Long-term Memory and Dynamic Similarity Weight for Generating Thermally Activated Delayed Fluorescence (TADF) Molecules

Longkun Xu,Mingwei Ge,Wei Shao,Haishun Jin,Mengxia Liu,Qiang Wang
DOI: https://doi.org/10.26434/chemrxiv-2024-jjjjk
2024-10-10
Abstract:Thermally Activated Delayed Fluorescence (TADF) molecules are expected to be used in emitting layer materials for the next generation of organic light-emitting diodes (OLEDs) in various display applications, but their high-throughput discovery/generation are still challenging due to their vast chemical space, high cost of quantum chemical calculations and the tricky exploration-exploitation trade-off. Besides, TADF populations are far away from existing open-source database, which hinders the use of these database and makes the direct use of deep generative models a challenge. To address these issues, in this work we present an iterative and self-improving workflow, TADF-GEN. We combine atom-wise and fragment-based morphing operations designed with domain knowledge, machine learning based property prediction methods, multiscale quantum chemical calculations, TADF-specific metrics, with our special implementation of long-term memory (LTM) and dynamic similarity weight (DSW). With TADF-GEN, we explored the chemical space including various types and sizes of TADF molecules for the first time and established a new dataset of over 1.3 M molecules among which over 39 K molecules with TD-DFT labelled data. Combined LTM with DSM, we find the improvement of molecular diversity inherently help generate molecules with better performance. Besides, our model can effectively navigate the chemical space across different TADF domains (from multiple resonance type to donor-acceptor type). With accurate double hybrid TD-DFT excited state calculations, our generated molecules are proved excellent in both TADF properties and diversity compared with their seed molecules. Our TADF-GEN model and generated database can be directly used for future TADF works. Our proposed workflow is expected to be also useful for molecular discovery/generation in other data-shortage domains such as batteries and semiconductor materials.
Chemistry
What problem does this paper attempt to address?