Real-Time Fashion-Guided Clothing Semantic Parsing: A Lightweight Multi-Scale Inception Neural Network and Benchmark.

Yuhang He,Lu Yang,Long Chen
2017-01-01
Abstract:Currently two barriers exist that sabotage clothing semantic parsing research: existing methods are time-consuming and the lack of large publicly available dataset that enables parsing at multiple scales. To mitigate these two dilemmas, we hereby embrace deep learning method and design a lightweight multi-scale inception neural network which is at both inside and outside multi-scale inception during training. Moreover, atrous convolution block is involved to enlarge the field of view while bringing neither extra computation cost nor parameters. Then the pre-trained model is further pruned and compressed by fine-tuning on a lightweight version of the same network used earlier, in which the inactive feature response and connections below a pre-defined threshold are directly removed. Besides, we construct so far the largest fashion guided clothing semantic parsing dataset(FCP) which contains a total of 5,000 clothing images and each image associates with both pixel-level, object-level and image-level annotations. All clothing in the dataset are recommended by fashion experts or trendsetters and contains as many as 65 common clothing items, accessories. We organize the dataset as Wordnet tree structure so that it enables fashionably parsing hierarchically. Finally, we conduct extensive experiments on three currently available datasets. Both quantitative and qualitative results demonstrate the priority and feasibility of our method, comparing with several other deep learning based methods. Our method achieves 35 FPS in a single Nvidia Titian X GPU with only minimal accuracy loss.
What problem does this paper attempt to address?