A non-orthogonal representation of the chemical space

Tiago F. T. Cerqueira,Silvana Botti,Miguel A. L. Marques
2024-06-28
Abstract:We present a novel approach to generate a fingerprint for crystalline materials that balances efficiency for machine processing and human interpretability, allowing its application in both machine learning inference and understanding of structure-property relationships. Our proposed fingerprint has two components: one representing the crystal structure and the other characterizing the chemical composition. To represent the latter we construct a non-orthogonal space where each axis represents a chemical element and where the angle between the axes quantifies a measure of the similarity between them. The chemical composition is then defined by the point on the unit sphere in this non-orthogonal space. By utilizing dimension reduction techniques we can construct a two-dimensional global map of the space of the thermodynamically stable crystalline compounds. Despite their simplicity, such maps succeed in providing a physical separation of material classes according to basic physical properties. Moreover, this compositional fingerprint can be used as a versatile input for machine learning algorithms, supplanting conventional one-hot representations of the chemical composition.
Materials Science
What problem does this paper attempt to address?
This paper proposes a solution to the problem of non-orthogonal representation of chemical space in materials science. Traditional representation methods, such as one-hot encoding, are not ideal for understanding and machine processing efficiency. The authors propose a new fingerprint method that combines two components of crystal structure and chemical composition, making the representation of materials suitable for both machine learning and human understanding. In the chemical composition part, they construct a non-orthogonal space where each axis represents a chemical element, and the angle between axes measures the similarity between elements. The chemical composition is defined by points on the unit sphere in this non-orthogonal space. Through dimensionality reduction techniques, the space of thermodynamically stable crystalline compounds can be mapped to a two-dimensional global map, which shows the physical separation of material categories. This fingerprint can be used as the input for machine learning algorithms, replacing traditional one-hot encoding representations, and is more easily interpretable in understanding and structure-property relationships. The paper demonstrates how to use this fingerprint to create a structural map and how to visualize the variation of material properties throughout the chemical space. In addition, the authors compare the performance of their method with existing techniques such as CrabNet and mat2vec in predicting material properties, demonstrating its competitiveness while maintaining simplicity and interpretability. Although there are limitations (such as insufficient information for rare gases and certain actinide elements), this work provides a valuable tool for human interpretation in computational prediction and high-throughput research in materials science.