Abstract:The ability to correctly classify and retrieve apparel images has a variety of applications important to e-commerce, online advertising and internet search. In this work, we propose a robust framework for fine-grained apparel classification, in-shop and cross-domain retrieval which eliminates the requirement of rich annotations like bounding boxes and human-joints or clothing landmarks, and training of bounding box/ key-landmark detector for the same. Factors such as subtle appearance differences, variations in human poses, different shooting angles, apparel deformations, and self-occlusion add to the challenges in classification and retrieval of apparel items. Cross-domain retrieval is even harder due to the presence of large variation between online shopping images, usually taken in ideal lighting, pose, positive angle and clean background as compared with street photos captured by users in complicated conditions with poor lighting and cluttered scenes. Our framework uses compact bilinear CNN with tensor sketch algorithm to generate embeddings that capture local pairwise feature interactions in a translationally invariant manner. For apparel classification, we pass the feature embeddings through a softmax classifier, while, the in-shop and cross-domain retrieval pipelines use a triplet-loss based optimization approach, such that squared Euclidean distance between embeddings measures the dissimilarity between the images. Unlike previous works that relied on bounding box, key clothing landmarks or human joint detectors to assist the final deep classifier, proposed framework can be trained directly on the provided category labels or generated triplets for triplet loss optimization. Lastly, Experimental results on the DeepFashion fine-grained categorization, and in-shop and consumer-to-shop retrieval datasets provide a comparative analysis with previous work performed in the domain.

A self-attention-based destruction and construction learning fine-grained image classification method for retail product recognition

Fine Classification Method of Product Image Based on Multi-Level Convolutional Neural Networks

Automatic Fast Classification of Product-Images with Class-Specific Descriptor

Fine-Grained Grocery Product Recognition by One-Shot Learning.

Matryoshka Peek: Toward Learning Fine-Grained, Robust, Discriminative Features for Product Search

Product Recognition for Unmanned Vending Machines

An Improved Deep Learning Approach For Product Recognition on Racks in Retail Stores

Inferring the Importance of Product Appearance with Semi-supervised Multi-modal Enhancement: A Step Towards the Screenless Retailing.

Object and attribute recognition for product image with self-supervised learning

Fine-Grained Image Classification Via Spatial Saliency Extraction.

Split-Check: Boosting Product Recognition Via Instance-Level Retrieval

Enhanced Self-Checkout System for Retail Based on Improved YOLOv10

An Effective Framework of Multi-Class Product Counting and Recognition for Automated Retail Checkout

Smart retail SKUs checkout using improved residual network

Learning More Discriminative Clues with Gradual Attention for Fine-Grained Visual Categorization.

Attention-based cropping and erasing learning with coarse-to-fine refinement for fine-grained visual classification

Learning Visual Features from Product Title for Image Retrieval

Self-Supervised Fully Automatic Learning Machine for Intelligent Retail Container

Fine-grained Apparel Classification and Retrieval without rich annotations

Online Cost Efficient Customer Recognition System for Retail Analytics.

Large-Scale Product Classification via Spatial Attention Based CNN Learning and Multi-class Regression