Graphormer-IR: Graph Transformers Predict Experimental IR Spectra Using Highly Specialized Attention

Cailum Stienstra,Liam Hebert,Patrick Thomas,Alexander Haack,Jason Guo,Scott Hopkins

DOI: https://doi.org/10.26434/chemrxiv-2023-f38b5-v2

2024-01-18

Abstract:Given that Infrared (IR) spectroscopy is a crucial tool in various chemical and forensic domains, improved in silico methods for predicting experimental spectra are needed due to the time and accuracy limitations of ab initio methods. We employ Graphormer, a graph neural network (GNN) transformer, to predict IR spectra using only Simplified Molecular-Input Line-Entry System (SMILES) strings. Our dataset includes 53,528 high-quality spectra with elements H, C, N, O, F, Si, S, P, Cl, Br, and I in five phases. When using only atomic numbers for node encodings, Graphormer-IR achieved mean test Spectral Information Similarity (SIS_μ) of 0.8449±0.0012 (n=5), surpassing the state-of-the-art Chemprop-IR (SIS_μ = 0.8409 ± 0.0014, n=5), with only 36% of the encoded information. Augmenting node embeddings with additional node-level descriptors in learned embeddings generated through a multi-layer perceptron improves scores to SIS_μ = 0.8523±0.0006, a total improvement of 19.7σ. These improved scores show how Graphormer-IR excels in capturing long-range interactions like hydrogen bonding, anharmonic peak positions in experimental spectra, and stretching frequencies of uncommon functional groups. Scaling our architecture to 210 attention heads demonstrates specialist-like behavior for distinct IR frequencies that improves model performance. Our model utilizes novel architectures, including a global node for phase encoding, learned node feature embeddings, and a 1D smoothing CNN. Graphormer-IR’s innovations underscore its value over traditional message-passing neural networks (MPNNs) due to its expressive embeddings and ability to capture long-range intra-molecular relationships.

Chemistry

What problem does this paper attempt to address?

The paper aims to address the limitations of time and accuracy in infrared spectroscopy prediction. Specifically: 1. **Limitations of existing methods**: Traditional ab initio methods have high time costs and limited accuracy when calculating infrared spectra. This is especially problematic when dealing with highly anharmonic vibrational modes, which require expensive computational corrections. 2. **Proposed method**: The paper employs Graphormer (a graph neural network transformer) to predict infrared spectra using only simplified molecular-input line-entry system (SMILES) strings. This method has advantages over existing message-passing neural network (MPNN)-based methods, as it can capture long-range interactions such as hydrogen bonds, anharmonic peak positions in experimental spectra, and stretching frequencies of uncommon functional groups. 3. **Dataset and results**: The study uses a dataset containing 53,528 high-quality spectra covering various elements and demonstrates that Graphormer-IR significantly improves spectral information similarity (SIS) compared to the current state-of-the-art Chemprop-IR model. The model's performance is further enhanced by adding additional descriptors to the node embeddings. In summary, the core objective of the paper is to develop an efficient and accurate method for infrared spectroscopy prediction to overcome the limitations of traditional computational methods and support applications in fields such as chemistry and biomedicine.

Graphormer-IR: Graph Transformers Predict Experimental IR Spectra Using Highly Specialized Attention

A Machine-Learned "Chemical Intuition" to Overcome Spectroscopic Data Scarcity

Predicting Infrared Spectra with Message Passing Neural Networks

Infrared Spectral Analysis for Prediction of Functional Groups Based on Feature-Aggregated Deep Learning

GT-NMR: a novel graph transformer-based approach for accurate prediction of NMR chemical shifts

Molecular Graph Enhanced Transformer for Retrosynthesis Prediction

Leveraging infrared spectroscopy for automated structure elucidation

GraphormerDTI: A graph transformer-based approach for drug-target interaction prediction

Neural Network Approach for Predicting Infrared Spectra from 3D Molecular Structure

An Empirical Study of Graphormer on Large-Scale Molecular Modeling Datasets

Prediction of the Infrared Absorbance Intensities and Frequencies of Hydrocarbons:A Message Passing Neural Network Approach

QC-GN2oMS2: a Graph Neural Net for High Resolution Mass Spectra Prediction

Transfer learning based on atomic feature extraction for the prediction of experimental ¹³C chemical shifts

Enhancing Molecular Structure Elucidation: MultiModalTransformer for both simulated and experimental spectra

Conditional Molecular Generation Net Enables Automated Structure Elucidation Based on 13C NMR Spectra and Prior Knowledge.

ChemGrapher: Optical Graph Recognition of Chemical Compounds by Deep Learning

GraphXForm: Graph transformer for computer-aided molecular design with application to extraction

Using Graph Neural Networks for Mass Spectrometry Prediction

Autoencoding Undirected Molecular Graphs With Neural Networks

MolGrapher: Graph-based Visual Recognition of Chemical Structures

Mass Spectra Prediction with Structural Motif-based Graph Neural Networks