MFTr_Locator: a Novel Transformer Model Decoding Multi-Label Protein Subcellular Locations in Multi-Field ImmunohistoChemistry Image

Ziqian Wang,Kai Zou,Sihui Zhu,Peiqi Cai,Fan Yang
DOI: https://doi.org/10.1109/CSECS60003.2023.10428564
2023-01-01
Abstract:Proteins are a kind of biological macromolecules that play a vital role in building the cytoskeleton, being a catalyst in biological processes, and participating in the biological immune process. The loss and incorrect of the subcellular location of proteins prone to suffer from pathological disease. Therefore, predicting accurately subcellular location of proteins has become a hot research topic. In particular, proteins that occur in multiple subcellular locations have received little attention. Based on this, a novel protein multi-label subcellular location predictor called MFTr_Locator was developed to predict the subcellular location of multi-label proteins in multi-field immunohistochemistry (IHC) images. The proposed method consists of four main steps, namely, splitting the IHC images into multi-field patches, image feature extraction, feature selection, and multi-label classification. First, an IHC image was initially split into three scales using three-size slide windows, which represent the region with abundant protein expression in the current field size. In addition, subcellular location features (SLFs) and frequency domain operator were employed to quantify the multi-scaled IHC patch images, and stepwise discriminant analysis (SDA) was adopted to reduce feature dimension. Finally, a concatenated feature that included multi-field features from an IHC image was fed into the decoder of the Transformer to recognize subcellular locations of multi-label proteins. To validate the proposed method, a benchmark dataset comprising 2100 single-label and 1200 multi-label IHC images from the human protein atlas (HPA) was collected. The main contributions of our method are as follows: Firstly, we employ multi-field images instead of single-field images for subcellular location recognition. Secondly, we use Transformer's decoder as a multi-label classifier. From the experimental results, the subset accuracy of mixed labels reaches 79.7%. The promising results show that the developed model substantially outperforms current in-field methods and may provide protein subcellular location with model analysis tools effectively.
What problem does this paper attempt to address?