Inferring the Importance of Product Appearance with Semi-supervised Multi-modal Enhancement: A Step Towards the Screenless Retailing.

Yongshun Gong,Jinfeng Yi,Dong-Dong Chen,Jian Zhang,Jiayu Zhou,Zhi-Hua Zhou
DOI: https://doi.org/10.1145/3474085.3481538
2021-01-01
Abstract:Nowadays, almost all the online orders were placed through screened devices such as mobile phones, tablets, and computers. With the rapid development of the Internet of Things (IoT) and smart appliances, more and more screenless smart devices, e.g., smart speaker and smart refrigerator, appear in our daily lives. They open up new means of interaction and may provide an excellent opportunity to reach new customers and increase sales. However, not all the items are suitable for screenless shopping, since some items' appearance play an important role in consumer decision making. Typical examples include clothes, dolls, bags, and shoes. In this paper, we aim to infer the significance of every item's appearance in consumer decision making and identify the group of items that are suitable for screenless shopping. Specifically, we formulate the problem as a classification task that predicts if an item's appearance has a significant impact on people's purchase behavior. To solve this problem, we extract multi-modal features from three different views, and collect a set of necessary labels via crowdsourcing. We then propose an iterative semi-supervised learning framework with a carefully designed multi-modal enhancement module. Experimental results verify the effectiveness of the proposed method.
What problem does this paper attempt to address?