A Deep Forest Method for Classifying E-Commerce Products by Using Title Information

Jin Dai,Tianyu Wang,Shaowei Wang
DOI: https://doi.org/10.1109/icnc47757.2020.9049751
2020-01-01
Abstract:E-commerce platforms, such as Amazon, eBay and Tmall, are flooded with various types of products. These platforms need to classify the products to facilitate product management and recommendation, which however can be very costly by using manual work. Recently, ML-based classification technology, e.g. SVM and DL, has been widely used in industry to classify e-commerce products by using the text information in the titles given by the merchants. However, current techniques can be inefficient and inaccurate when the number of categories is large and the data scale is small, as in the e-commerce product classification problem. In this paper, we propose a novel machine learning method for the problem, referred to as gcForest, which utilizes the cascade forest of decision trees and multi-grained scanning mechanisms. After preprocessing the product title information by using a word examination technology, the TF-IDF algorithm, we carry out a serials of experiments with 4000 samples belonging to 35 categories of products. The experiment results show that the classification accuracy using gcForest is 92.38%, which outperforms SVM with RBF kernel (86.88%), SVM with linear kernel (89.73%) and CNN (86.86%).
What problem does this paper attempt to address?