Anuroop Sriram,Benjamin Kurt Miller,Ricky T. Q. Chen,Brandon M. Wood
Abstract:Material discovery is a critical area of research with the potential to revolutionize various fields, including carbon capture, renewable energy, and electronics. However, the immense scale of the chemical space makes it challenging to explore all possible materials experimentally. In this paper, we introduce FlowLLM, a novel generative model that combines large language models (LLMs) and Riemannian flow matching (RFM) to design novel crystalline materials. FlowLLM first fine-tunes an LLM to learn an effective base distribution of meta-stable crystals in a text representation. After converting to a graph representation, the RFM model takes samples from the LLM and iteratively refines the coordinates and lattice parameters. Our approach significantly outperforms state-of-the-art methods, increasing the generation rate of stable materials by over three times and increasing the rate for stable, unique, and novel crystals by $\sim50\%$ - a huge improvement on a difficult problem. Additionally, the crystals generated by FlowLLM are much closer to their relaxed state when compared with another leading model, significantly reducing post-hoc computational cost.
What problem does this paper attempt to address?
This paper attempts to address a key challenge in material discovery: how to efficiently generate stable, unique, and novel (abbreviated as S.U.N.) crystalline materials. Specifically, the paper proposes a new generative model - FlowLLM, aiming to combine large - language models (LLMs) and Riemannian Flow Matching (RFM) techniques to generate new types of crystalline materials. This method is particularly applicable to fields such as carbon capture, renewable energy, and electronics. In these fields, material discovery has great potential, but due to the vast size of the chemical space, it has become extremely difficult to experimentally explore all possible materials.
### Main Problems
1. **Exploration of the Chemical Space**: Due to the vastness of the chemical space, traditional experimental methods cannot effectively explore all possible material combinations.
2. **Efficiency of Generating Stable Materials**: Existing generative models have low efficiency in generating stable materials and require a large amount of computational resources to screen out viable materials.
3. **Generating High - Quality Materials**: Materials generated by existing methods often have a large gap from their relaxed states, resulting in high subsequent computational costs.
### Solutions
The FlowLLM model proposed in the paper solves the above problems through the following steps:
1. **Using LLM to Generate Initial Material Representations**: First, by fine - tuning a large - language model (LLM), it learns an effective text representation of metastable crystals. This step takes advantage of the LLM's strength in handling discrete values such as atom types.
2. **Using RFM for Iterative Optimization**: After converting the materials generated by the LLM into graph representations, the RFM model samples from the samples generated by the LLM and iteratively refines the atomic positions and lattice parameters. This step takes advantage of the RFM's strength in handling continuous values such as atomic positions and lattice geometries.
### Main Contributions
1. **Innovative Hybrid Method**: FlowLLM combines the advantages of LLM and RFM, effectively bridging the gap between discrete and continuous modeling.
2. **Significant Performance Improvement**: Experimental results show that FlowLLM increases the rate of generating stable materials by more than 300% compared to existing methods and increases the rate of generating S.U.N. materials by about 50%.
3. **Natural - Language Prompting Ability**: FlowLLM retains the natural - language prompting ability of the LLM and can flexibly generate materials under specific conditions, such as materials with high band gaps and thermal stability.
### Experimental Results
- **Stability Rate**: 17.82% of the materials generated by FlowLLM are stable, of which 48% are novel, 58% are unique, and the S.U.N. rate is 4.92%.
- **Comparison with Existing Methods**: FlowLLM significantly outperforms existing generative models such as CD - VAE, DiffCSP, FlowMM, and CrystalLLM on multiple metrics.
- **Closeness of Generated Structures to Relaxed States**: The structures generated by FlowLLM are closer to their relaxed states, significantly reducing the cost of subsequent calculations.
In conclusion, FlowLLM, by combining the advantages of LLM and RFM, provides an efficient and high - quality material generation method, bringing important progress to the field of material discovery.