空间站广场

论文

Notebooks

比赛

课程

Apps

我的主页

我的Notebooks

我的论文库

我的足迹

我的工作空间

任务

节点

文件

数据集

镜像

项目

数据库

公开

Uni-ELF Ideas & App Heads-on Explained

Uni-ELF

Uni-Mol

Piloteye

English

Bohrium Apps

Uni-ELFUni-MolPiloteyeEnglishBohrium Apps

chenx@dp.tech

更新于 2024-09-09

推荐镜像 :Basic Image:ubuntu22.04-py3.10-irkernel-r4.4.1

推荐机型 :c2_m4_cpu

Abstract

1. Background

2. Uni-ELF Framework Design: Multi-level Representation Learning

3. Uni-ELF Prediction Performance

3.1 Prediction of Electrolyte Molecular Properties

4. Application Example: Rediscovery of FAN Molecules

5. Uni-ELF in Practice: the Uni-ELF App on Piloteye®

Entering the Formulation Prediction Interface

Customize Input File: LiPF6-EC-DEC

Submitting the Job

Viewing the Result

©️ Copyright 2024 @ Authors
Author: chenx@dp.tech📨
Date: 2024-07-23
Sharing Agreement: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Quick Start: Click the Start Connection button above, select the bohrium-notebook:2023-04-07 image, and choose any configuration to start.

代码

文本

Main content is translated from an earlier Bohrium notebook https://bohrium.dp.tech/notebooks/77711486918 which is translated from the paper 10.48550/arXiv.2407.06152

代码

文本

Abstract

代码

文本

In today's rapidly advancing battery technology landscape, the design and engineering of electrolytes have become critical factors driving progress in lithium battery technology. However, existing molecular design and formulation optimization approaches often lack an effective computational-experimental feedback loop. On one hand, electrolyte formulation design requires balancing multiple bulk property dimensions, making it difficult to achieve rapid and accurate optimization solely through experimental testing or simulation screening. On the other hand, formulation optimization ultimately depends on the performance of battery cells, and the field has long lacked an effective model to connect formulation composition with final cell performance indicators.

To address these challenges, DP Technology's research team has innovatively developed the Uni-ELF universal electrolyte formulation design framework. Through pre-training at both the molecular and formulation stages, Uni-ELF significantly outperforms existing advanced methods in predicting molecular properties (such as melting point, boiling point, and synthesizability) and formulation properties (such as conductivity and Coulombic efficiency). We anticipate this innovative framework will play a key role in AI-driven automated electrolyte design and engineering, contributing to breakthroughs in next-generation high-performance battery technologies.

代码

文本

1. Background

Lithium-based rechargeable batteries are the cornerstone of modern energy storage technology, offering the potential for high energy density, fast charging, and long lifespans. As the ionic conductor and electronic insulator between electrodes, the electrolyte must remain stable under extreme chemical conditions, with the interfaces between the electrolyte and other battery components playing a critical role. As we enter an era of high-energy-density batteries with more stringent electrolyte requirements—especially for high-voltage cathode materials and high-energy-density anode materials like lithium metal—electrolyte design and engineering have become major challenges. Current electrolyte systems based on ethylene carbonate (EC) are increasingly inadequate for these next-generation energy storage solutions. Thus, breakthroughs in materials and chemistry for future batteries hinge on mastering electrolyte design.

Electrolyte research and development face two primary challenges: innovative molecular design and electrolyte formulation optimization. These challenges arise from the need to fine-tune electrolyte conductivity, solubility, stability, and compatibility with electrode materials to meet stringent performance standards. Unlike fields such as drug design, formulation-level design of electrolytes is especially crucial, involving predictions and recommendations for mixing ratios of lithium salts, solvents, and functional additives. The interactions between these components can significantly impact the battery’s energy density, cycle life, and overall performance. The diversity of the molecular space further complicates the challenge of identifying candidates and potential mixtures, especially in multi-component systems.

The trial-and-error approach lacks the efficiency needed for the rapid development of new electrolyte systems. Over the past few decades, advancements in computational methods, such as density functional theory (DFT) and molecular dynamics, have enabled the analysis of dynamic behavior at the electronic and atomic levels, deriving macroscopic properties via statistical mechanics. However, the complexity inside batteries, especially on multi-scale levels, has hindered a full understanding of their mechanisms, making it difficult to develop efficient, predictive simulators for rational design.

On the other hand, data-driven approaches such as quantitative structure-property relationships (QSPR) have been developed, where molecular representations are obtained through feature engineering. Handcrafting features or descriptors requires extensive domain knowledge and is often disadvantageous when dealing with large-scale and high-dimensional problems. Additionally, the scarcity of data makes the transferability of data-driven models uncertain. The rapid development of deep learning, particularly in molecular representation learning and the pretraining-finetuning paradigm, has mitigated this issue. Among these methods, the Uni-Mol framework effectively incorporates 3D molecular information and has achieved widespread success in chemistry and materials science, including small organic molecules, organic light-emitting diodes (OLEDs), and metal-organic frameworks, focusing primarily on the relationship between individual molecules and their properties. However, similar approaches are lacking at the formulation level, and existing attempts are largely based on traditional regression methods and conventional machine learning models like random forests and XGBoost.

In this study, we introduce the Universal Electrolyte Formulation (Uni-ELF) framework, which excels in predicting electrolyte properties and designing electrolyte formulations through a multi-level pretraining scheme. At the molecular level, it leverages the Uni-Mol model to reconstruct 3D molecular structures, while at the mixture level, it predicts statistical structural properties derived from molecular dynamics simulations, such as radial distribution functions. Systematic experiments demonstrate that, after pretraining, Uni-ELF outperforms state-of-the-art (SOTA) methods across a wide range of tasks, accurately predicting key properties at both the molecular and mixture levels. Uni-ELF's performance is expected to improve further by integrating physics-driven modeling and high-quality data obtained through autonomous experiments. We believe that Uni-ELF not only represents an innovative approach unifying representation learning tasks across different levels of electrolytes but also serves as a timely and effective tool for industrial-scale intelligent battery design.

代码

文本

2. Uni-ELF Framework Design: Multi-level Representation Learning

代码

文本

alt

Figure 1. Electrolyte formulation representation learning framework. a, Electrolyte design at multiple levels.** At the atomic level, individual atoms and their interactions form molecular geometric structures, creating molecular-level representations. Based on these, individual molecular species, their proportions, and their interactions (depicted by red lines) within the mixtures create formulation-level representations, which are then used to predict device-level properties. b, Multi-level representation learning: b1. Molecule-level representations are learned through self-supervised tasks, including recovering masked atom types and denoising atom pair distances. b2. These refined representations are then fed with mixture ratios into the Uni-ELF backbone. c, Uni-ELF backbone model architecture. The Uni-ELF model is based on a transformer encoder design. Molar ratios are used as weights for molecular representations, and pair representations are maintained for mixture-level pretraining. Symmetrical elements in the pair representation matrix are summed and combined with the radial features obtained from the Gaussian kernel. These combined features are then used to predict radial distribution functions (RDFs), a pretraining task to recover the structural properties of the mixed system.

代码

文本

o improve predictive capabilities, the formulation model should incorporate specific inductive biases. Recognizing that an entity's characteristics are shaped not only by its intrinsic properties but also by its interactions with other entities, the model must differentiate the same molecular species across different contexts. Additionally, it should preserve the permutation invariance of molecular input sequences, ensuring that the output remains consistent regardless of input order.

To achieve these goals, we designed the Uni-ELF backbone using a transformer encoder architecture, as shown in Figure 1(b2, c). At the formulation level, the model processes molecular representations weighted by their molar ratios, refining representations of individual molecular species and their interactions. These refined representations are then aggregated according to their molar ratios. For tasks involving ambient temperature, we introduced a temperature embedding block that uses a Gaussian kernel. This block encodes temperature values through a set of Gaussian basis functions with specified means and standard deviations.

The model is pre-trained to predict solution structures, thus learning formulation representations. Given the scarcity of experimental data, we supplemented this with physical modeling, providing additional structural data sources for transfer learning. Within the Uni-ELF framework, molecular dynamics simulations generate extensive trajectory data of solution particles. These trajectories are statistically averaged to extract structural features of the solution. Specifically, radial distribution functions (RDFs) offer the density probability of particles having neighbors at a given distance, revealing the fine structure of the solution. Pairwise RDFs (detailed in the supplementary information) are particularly suited for edge-level tasks within the transformer encoder, making them ideal for use as pre-training data.

During pre-training, Uni-ELF receives not only the molecular species and their molar ratios but also a series of radial distance values. These radial distances are embedded using a Gaussian kernel. The model maintains pairwise representations of molecular species, leveraging the inherent symmetry of RDFs between molecules. Specifically, the attention representations of matrix elements $[i, j]$ and $[j, i]$ are summed to form the pairwise representation. This summed representation is then concatenated with the embedded radial distance values to predict the RDF $(g_{ij} (r))$ of molecular pair $[i, j]$ at a given radial distance $[r]$ .

In predicting RDFs, the model achieved a root mean square error (RMSE) of 0.06 on the final test set. As shown in Figure 2, the strong alignment between the predicted and true RDFs in the test set, including for the LiPF6/PC/EMC system, highlights the accuracy of the Uni-ELF model during pre-training. This high level of accuracy in reproducing formulation structural information suggests that these learned representations are likely to transfer effectively to downstream property prediction tasks.

代码

文本

alt

Figure 2. Prediction of molecular pairwise RDFs as a formulation-level pretraining task, using the LiPF6/PC/EMC system with a molar ratio of n(Li⁺ ) : n(PF⁻₆) : n(PC) : n(EMC) = 0.12 : 0.12 : 0.54 : 0.22 as an example. The plots compare the true values obtained from molecular dynamics (MD) simulations (blue) with the predicted values from the Uni-ELF model (orange) for various molecular pairs: PF⁻₆, Li⁺ and EMC, including all pairwise combinations forming a lower triangular matrix. The right panel illustrates the system configuration. The strong agreement between predicted and true RDFs demonstrates the accuracy of the Uni-ELF model during pretraining.

代码

文本

3. Uni-ELF Prediction Performance

代码

文本

3.1 Prediction of Electrolyte Molecular Properties

The research team first utilized Uni-ELF’s molecular representation capabilities to predict key properties critical to electrolyte design. As shown in Figure 3, Uni-ELF outperformed state-of-the-art quantitative structure-property relationship (QSPR) methods in predicting fundamental physical properties such as melting point, boiling point, density, vapor pressure, refractive index, as well as molecular synthesizability. This demonstrates the potential of the Uni-ELF framework in identifying novel electrolyte molecules across a broad chemical space.

代码

文本

alt

Figure 3. Comparative performance in predicting molecular properties for electrolyte design. Uni-ELF (in purple) surpasses previously reported state-of-the-art (SOTA) methods (in blue) in predicting seven molecular properties (melting point, boiling point, vapor pressure, dielectric constant, refractive index, density on R² scores, and synthesizability on the AUC), which are essential for the inverse molecular design of electrolytes. Each concentric circle represents an interval of 0.05, with the outermost boundary corresponding to a perfect score of 1.0.

代码

文本

Specifically, we first leveraged Uni-ELF’s molecular representation capabilities to predict key properties in electrolyte design. As shown in Figure 3, Uni-ELF outperformed state-of-the-art methods. For melting point prediction, it achieved an R² of 0.857 and an RMSE of 34.31°C, surpassing the previous benchmark of R² 0.830 and RMSE 36.88°C. In boiling point and vapor pressure predictions, Uni-ELF outperformed the OPERA model, achieving an R² of 0.975 and RMSE of 13.49°C for boiling point, and an R² of 0.951 and RMSE of 0.79 Log mm/Hg for vapor pressure. Additionally, it outperformed QSPR models in predicting dielectric constant, refractive index, and density, with R² values of 0.966, 0.982, and 0.992, and corresponding RMSE values of 2.70, 0.082, and 0.025 g/cm³, respectively. These results highlight the superior performance of representation learning in predicting molecular properties compared to traditional QSPR methods.

To further explore the model’s ability to identify promising electrolyte molecules, we evaluated its performance in predicting molecular synthesizability. Predicting the synthesizability of new molecules is a challenging task, typically reliant on the intuition and experience of chemists. Lee et al. curated a dataset from QM9, consisting of 126,405 entries, to assess molecular synthesizability. Molecules from QM9 were classified as synthesizable if they were listed in the PubChem or eMolecules databases, while those not listed were assumed to be unsynthesizable. In this task, our model achieved an AUC of 0.965, surpassing the previous best of 0.955. Although the absence of certain molecules in these databases does not definitively indicate they are unsynthesizable, it provides valuable insights into the relative ease or difficulty of synthesis. By integrating the conditions required for electrolytes (such as wide liquid range and lithium salt solubility) with the trained models for melting point, boiling point, dielectric constant, and synthesizability, our approach offers a powerful reference for assessing the potential suitability and synthetic feasibility of virtually generated molecules as electrolytes.

代码

文本

alt

Table 1. RMSE results on the Coulombic efficiency and liquid electrolyte conductivity datasets for different methods and configurations, with the best RMSE denoted in bold. The random split column represents the data randomly divided into training and test sets, while the group split column represents the data grouped by formulation systems containing identical sets of molecular species and randomly split into training and test sets according to their group. Results are reported as the mean of three independent experiments, with standard deviation in parentheses.

Figure 4. Regression plots for electrolyte formulation property prediction using Uni-ELF. (a) Results of the Coulombic efficiency dataset. (b,c) Liquid electrolyte conductivity dataset, with (b) representing the random split and (c) the group split. The regression plots show the parity between experimental and predicted values in the test sets, with insets showing the results in the training sets. To illustrate data distribution, kernel density estimation is displayed at the top and right of each plot. The color gradients in the plots indicate the magnitude of prediction errors.

代码

文本

Specifically, we reviewed and corrected two original datasets: one for Coulombic efficiency (CE) of lithium metal anode batteries and another for electrolyte conductivity. For the CE dataset, we removed duplicate entries with the same ratio but different measurement methods or values and corrected errors in some ratios and molecular information. This resulted in a dataset containing 149 logarithmic coulombic efficiency (LCE) entries, defined as −log(1−CE). For the conductivity dataset, we similarly corrected errors and filtered out polymer electrolytes to focus on liquid electrolytes. The final conductivity dataset, compiled at various temperatures, contains 2,588 entries.

Both datasets were split into training and test sets at a 7:3 ratio. Additionally, to evaluate the model's ability to predict new formulation systems, we applied an additional grouping method to the conductivity dataset. In this method, data from formulation systems containing the same molecular species were grouped and then randomly split into training and test sets based on these groups. We used five-fold cross-validation during training to enhance the model’s robustness. The final model was an ensemble of five models from each training iteration, and the performance metrics were derived from the averaged predictions on the test set.

We established several baseline methods for constructing formulation fingerprints at the molecular and formulation levels and used XGBoost for regression predictions. These methods included: one-hot encoding of all molecular types in the dataset, where the formulation fingerprint only contained molecular species and ratio information without structural details; Morgan fingerprints to encode molecular structures; and Uni-Mol fingerprints derived from the Uni-Mol pre-trained model, which did not dynamically adjust features. To improve prediction accuracy in the electrolyte context, we separated the formulation fingerprints into solvent and salt components. Specifically, the fingerprints for molecules or ions were weighted by their molar ratios to generate fingerprints for each part, which were then concatenated to form the complete formulation fingerprint. For the conductivity dataset, temperature was included as a one-dimensional feature in the formulation fingerprint.

Table 1 summarizes the performance of various molecular representation schemes across different tasks. Notably, all methods discussed significantly outperformed the recent results by Kim et al. Across all tasks, we observed a consistent performance trend: the pre-trained Uni-ELF model performed best, followed by the non-pre-trained Uni-ELF model, Uni-Mol fingerprints, Morgan fingerprints, and finally, one-hot embeddings. For example, on the LCE dataset, the pre-trained Uni-ELF model achieved an RMSE of 0.184, reducing the error by approximately 14% compared to the non-pre-trained Uni-ELF model, which had an RMSE of 0.215. Similarly, for the conductivity dataset, the pre-trained Uni-ELF model achieved RMSEs of 0.50 mS/cm (random split) and 2.15 mS/cm (group split), reducing the error by approximately 6% and 13%, respectively, compared to the non-pre-trained model.

These performance results align with intuitive expectations. One-hot embeddings, as simple numerical representations without structural information, performed the worst. Morgan fingerprints, which capture some molecular-level features, showed moderate improvement. Uni-Mol fingerprints, which contain richer molecular structures, further enhanced performance. The non-pre-trained Uni-ELF model outperformed the Uni-Mol fingerprint with XGBoost, highlighting the effectiveness of the transformer-based Uni-ELF architecture. Finally, the pre-trained Uni-ELF model, incorporating richer formulation-level structural information, achieved the best performance across all tasks.

As illustrated in Figure 4, the consistency between Uni-ELF predictions and experimental results is evident. Specifically, Figure 4(c) shows that while the grouping method might introduce more bias (since some test data belong to groups absent from the training data), the predictions still follow a consistent trend. This demonstrates the robustness of the Uni-ELF model in handling diverse datasets and its ability to generalize well, even under challenging conditions.

In summary, the pre-trained Uni-ELF model sets a new benchmark for prediction accuracy in the field, demonstrating that capturing comprehensive molecular and formulation-level information is crucial for achieving superior performance in downstream tasks.

代码

文本

4. Application Example: Rediscovery of FAN Molecules

代码

文本

To demonstrate Uni-ELF's potential in molecular and formulation design, the research team showcased its application in rediscovering the fluoroacetonitrile (FAN) system. FAN, a high-conductivity solvent, was recently reported by Lu et al. in Nature.

As shown in Figure 5, the team began by using expert knowledge to limit the molecular search space. They focused on molecules containing cyano and fluoro groups, restricting the search to molecules with fewer than eight heavy atoms. Using graph theory, they generated thousands of potential molecules. Next, Uni-ELF was applied to predict the properties of these molecules, including the ionic conductivity of electrolyte formulations composed of these molecules paired with different lithium salts at various concentrations. The team filtered out unsuitable candidates by predicting melting/boiling points and analyzing synthetic feasibility using basic expert heuristics, then ranked the remaining molecules based on predicted conductivity.

Notably, despite the training data lacking FAN information, the zero-shot prediction process still identified FAN as the top candidate, as shown in Figure 5(c), highlighting Uni-ELF’s robustness and accuracy. For formulation properties, the team fine-tuned the model with a few samples at the predicted concentration of highest conductivity and boundary concentrations of interest. The model successfully generated conductivity-concentration curves that matched experimental data. Moreover, using only one room-temperature data point, the model was able to extrapolate the system’s high conductivity at lower temperatures.

This example demonstrates a complete design cycle with minimal expert intervention. Experts define functional groups and screening criteria based on chemical knowledge and design objectives, while Uni-ELF automates the rest. This highlights Uni-ELF's powerful capability to streamline and enhance the molecular and formulation design process.

代码

文本

alt

Figure 5 Example of Uni-ELF for Electrolyte Molecule and Formula Design

代码

文本

5. Uni-ELF in Practice: the Uni-ELF App on Piloteye®

代码

文本

As the AI for Science paradigm continues to develop, utilizing new technologies like artificial intelligence to overcome the challenges of electrolyte research has become a key trend. Currently, the electrolyte molecule and formulation design module based on the Uni-ELF framework has been offered as a App service and integrated into DP Technology's Piloteye® intelligent battery design platform. It encompasses five functional modules: molecule generation, molecular property prediction, molecular redox property prediction, electrolyte property prediction, and formulation property prediction. Additionally, it offers a user-friendly and intuitive interface, simplifying the design and optimization process for electrolyte research.

In this section, we will demonstrate a brief use case of Uni-ELF.

App address: https://bohrium.dp.tech/apps/uni-elf

代码

文本

Entering the Formulation Prediction Interface

代码

文本

alt

In this section, we will only focus on showcasing the Formulation Properties Prediction task. Click on the Formulation Properties Prediction button on the top to navigat to this function.

代码

文本

Customize Input File: LiPF₆-EC-DEC

代码

文本

alt

代码

文本

While the Uni-ELF Apps currently support multiple ways of input, we will showcase only the customized uploading scheme. The user can upload a .csv file following the format pre-defined in the exampleData.csv. The input data to be used in this case is from multiple previous studies. The input data is shown in the below figure: alt

代码

文本