Counting Molecules: Python based scheme for automated enumeration and categorization of molecules in scanning tunneling microscopy images

Jack Hellerstedt,Aleš Cahlík,Martin Švec,Oleksandr Stetsovych,Tyler Hennen
DOI: https://doi.org/10.1016/j.simpa.2022.100301
2022-03-04
Abstract:Scanning tunneling and atomic force microscopies (STM/nc-AFM) are rapidly progressing to offer unprecedented spatial resolution of a diverse array of chemical species. In particular, they are employed to characterize on-surface chemical reactions by directly examining precursors and products. Chiral effects and self-assembled structures can also be investigated. This open source, modular, python based scheme automates the categorization of a variety of molecules present in medium sized (10$\times$10 to 100$\times$100 nm) scanned probe images.
Mesoscale and Nanoscale Physics,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to automatically count and classify molecules in scanning tunneling microscope (STM) images. Specifically, the researchers have developed a Python - based scheme to process scanning probe images of medium size (10 × 10 to 100 × 100 nm) and achieve automatic identification, classification and statistical analysis of various molecules. ### Main problems 1. **Limitations of manual processing**: Existing statistical methods usually rely on manual operations, which limit the size and complexity of the data sets that can be processed. 2. **Deficiencies of existing tools**: Widely - used existing scanning probe data analysis tools (such as WSxM and Gwyddion) lack automatic image processing and feature extraction functions. And image processing tools in the field of biology (such as ImageJ) are not compatible with STM image data and are mostly commercial closed - source software. 3. **Diversity of molecular adsorption configurations**: The same molecule may have different adsorption configurations on the surface (such as rotation, chirality), which poses a challenge to classification. 4. **Requirement for computing resources**: Although machine - learning methods are effective, they require a large amount of computing resources and data to train convolutional neural networks, which is unrealistic for some application scenarios. ### Solutions To overcome the above problems, the researchers have developed a lightweight tool that uses the Zernike polynomial basis set as a "fingerprint" to represent each molecule and combines other physical features (such as the maximum topological height and contour perimeter) to input into the clustering algorithm for classification. This method has the following advantages: - **Rotation invariance**: Zernike moments are insensitive to rotation, translation, mirroring and scaling, so they can effectively match molecules with different adsorption configurations. - **Efficiency**: By using clustering methods such as the Birch algorithm, images containing hundreds of molecules can be quickly processed on a personal computer. - **Flexibility**: This tool is written in Python and has a modular design, which is easy to modify and customize to meet the needs of different data sets. In conclusion, this paper aims to provide an open - source, modular automatic tool for extracting quantitative information from STM images, thereby improving the understanding and analysis ability of surface chemical reactions.