A Novel Approach for Clothing Classification: Integrating CNN and Transformer in RST-Net

Chunxin Xu,Minghan Yuan
DOI: https://doi.org/10.1109/AINIT61980.2024.10581599
2024-03-29
Abstract:In clothing classification tasks, variations in shooting distance and angles can lead to changes in the size and shape of the clothing. To address this issue, this paper introduces a deep learning-based algorithm named RST-Net (Residual Spatial Transformer), which combines the characteristics of Convolutional Neural Network (CNN) and Transformer. ResNet is employed as the base model to extract local features of clothing images, effectively capturing the local information and spatial hierarchy of images. Spatial features of clothing data at different scales are processed in parallel at different stages of ResNet, enabling the network to flexibly handle images of different scales and shapes. A Transformer Encoder is introduced to encode features at different scales, capturing the global relationships of clothing images. Through this combination of CNN and Transformer, RST-NET can comprehensively consider both local and global information of clothing images, enhancing classification accuracy. To validate the effectiveness of the proposed method, experiments are conducted on a portion of the Fashion MINIST dataset. The results demonstrate significant performance improvement of RST-NET in clothing classification tasks, achieving an accuracy of 93.1%. Compared to the original ResNet and Transformer base models, RST-NET shows an increase in accuracy by 2.8% and 3.3%, respectively, further confirming the superiority of the algorithm.
Computer Science
What problem does this paper attempt to address?