Multi-label Classification and Interactive NLP-based Visualization of Electric Vehicle Patent Data

Djavan De Clercq,Ndeye-Fatou Diop,Devina Jain,Benjamin Tan,Zongguo Wen
DOI: https://doi.org/10.1016/j.wpi.2019.101903
2019-01-01
World Patent Information
Abstract:The objectives of this study are to (1) interactively visualize information embedded in patent texts, and (2) train a high-accuracy multi-label classification algorithm capable of classifying patents into multiple cooperative patent classification (CPC) classes. The case study involved metadata and text data of 17,500 electric vehicle patents. To these ends, the following methodology was applied: First, feature engineering was based on topic extraction from patent texts using latent dirichlet analysis (LDA) and the perplexity metric. Second, the multi-label implementations of the random forest, decision trees, and KNN algorithms were trained on the data in order to predict multiple class labels corresponding to a given electric vehicle patent. The results of this study were promising, with the best scores for performance metrics such as accuracy, precision, recall, f-score, and hamming loss being 0.91, 0.92, 0.74, and 0.02 respectively. The implications of our results are two-fold: firstly, we present the effectiveness of using open-source tools for customized patent analysis pipelines including interactive data visualization and machine learning. Secondly, our results provide a strong basis for automated multi-label patent classification into CPC classes.
What problem does this paper attempt to address?