Abstract:Machine learning has been widely verified and applied in chemoinformatics, and have achieved outstanding results in the prediction, modification, and optimization of luminescence, magnetism, and electrode materials. Here, we propose a deepth first search traversal (DFST) approach combined with lightGBM machine learning model to search the classic Organic field-effect transistor (OFET) functional molecules chemical space, which is simple but effective. Totally 2820588 molecules of different structure within two certain types of skeletons are generated successfully, which shows the searching efficiency of the DFST strategy. With the simplified molecular-input line-entry system (SMILES) utilized, the generation of alphanumeric strings that describe molecules directly tackle the inverse design problem, for the generation set has 100% chemical validity. Light Gradient Boosting Machine (LightGBM) model's intrinsic Distributed and efficient features enables much faster training process and higher training efficiency, which means better model performance with less amount of data. 184 out of 2.8 million molecules are finally screened out with density functional theory (DFT) calculation carried out to verify the accuracy of the prediction.

What problem does this paper attempt to address?

The problem this paper attempts to address is: how to efficiently screen high-performance molecular materials suitable for organic field-effect transistors (OFETs), particularly n-type semiconductor materials. Specifically, the authors propose a method based on Depth-First Search Traversal (DFST) and the LightGBM machine learning model to generate and screen molecules with specific skeleton structures, and verify the accuracy of the predictions through Density Functional Theory (DFT) calculations. ### Main Issues 1. **Efficient Generation and Screening of Molecules**: - Traditional methods are time-consuming and costly when generating and screening a large number of molecules. - There is a need for an efficient method to generate and preliminarily screen a large number of potential OFET molecular materials. 2. **Improving Prediction Accuracy**: - Traditional DFT calculations, while accurate, are computationally intensive and time-consuming. - It is necessary to combine machine learning methods to quickly predict the HOMO and LUMO energy levels of molecules and ensure the accuracy of the predictions. 3. **Balancing Screening Efficiency and Accuracy**: - When screening large datasets, it is necessary to find a reasonable error range to balance screening efficiency and prediction accuracy. - Avoid screening too few molecules due to overly high precision requirements, or screening too many molecules due to overly low precision requirements, which would increase the cost of subsequent DFT verification. ### Solutions 1. **Depth-First Search Traversal (DFST) Generator**: - Use the DFST algorithm to generate molecules with specific skeleton structures, such as tetracene and pentacene. - Generate a large number of molecular structures by replacing carbon atoms and adding functional groups. 2. **LightGBM Machine Learning Model**: - Use the LightGBM model combined with molecular fingerprints (ECFP4) to predict the HOMO and LUMO energy levels of molecules. - Quickly screen out molecules with high electron transport performance. 3. **DFT Secondary Screening**: - Perform DFT calculations on the preliminarily screened molecules to verify the accuracy of their HOMO and LUMO energy levels. - Further optimize the screening criteria to ensure that the finally screened molecules have high performance. 4. **Optimizing Screening Criteria**: - Design a desired function to discuss the standards for a reasonable error range. - Find a balance point to optimize screening efficiency and data accuracy. Through the above methods, the authors successfully generated 2,820,588 molecules with different structures and finally screened out 184 high-performance OFET molecular materials. This method is thousands of times faster than traditional high-throughput DFT screening while maintaining high prediction accuracy.

Machine Learning-Assisted High-Throughput Semi-empirical Search of OFET Molecular Materials

Deep Learning for Optoelectronic Properties of Organic Semiconductors

Accelerated Discovery of Two-Dimensional Optoelectronic Octahedral Oxyhalides via High-Throughput Ab Initio Calculations and Machine Learning

Efficient screening framework for organic solar cells with deep learning and ensemble learning

Construction frontier molecular orbital prediction model with transfer learning for organic materials

Universally Exhaustive Generation of Molecular Structures and Prediction of Their Electronic States Using Machine Learning for N-type Organic Transistor Materials.

Machine Learning-Assisted High-Throughput Virtual Screening for On-Demand Customization of Advanced Energetic Materials

OPTICAL PROPERTIES FOR ALL SYNTHESIZABLE MOLECULES FROM QUANTUM CHEMISTRY-BASED MACHINE LEARNING

Machine Learning for Screening Large Organic Molecules

Navigating Materials Space with ML-Generated Electronic Fingerprints

Machine Learning-Enabled Discovery of Multi-Resonance TADF Molecules: Unraveling PLQY Predictions from Molecular Structures

MolE8: Finding DFT Potential Energy Surface Minima Values from Force-Field Optimised Organic Molecules with New Machine Learning Representations

Data‐Driven Discovery of Organic Electronic Materials Enabled by Hybrid Top‐Down/Bottom‐Up Design

Graph deep learning accelerated efficient crystal structure search and feature extraction

Automatic Screen‐out of Ir(III) Complex Emitters by Combined Machine Learning and Computational Analysis

High precision deep-learning model combined with high-throughput screening to discover fused [5,5] biheterocyclic energetic materials with excellent comprehensive properties

Large Language Model‐Based AI Agent for Organic Semiconductor Devices Research

Methods and applications of machine learning in computational design of optoelectronic semiconductors

Designing Promising Molecules for Organic Solar Cells via Machine Learning Assisted Virtual Screening

Machine Learning-Assisted High-Throughput Screening of Transparent Organic Light-Emitting Diodes Anode Materials