Exploiting Word Embedding for Heterogeneous Topic Model Towards Patent Recommendation.

Anhui University,Tang Jie
DOI: https://doi.org/10.1007/s11192-020-03666-4
IF: 3.801
2020-01-01
Scientometrics
Abstract:Patent recommendation aims to recommend patent documents that have similar content to a given target patent. With the explosive growth in patent applications, how to recommend relevant patents from the massive number of patents has become an extremely challenging problem. The main obstacle in patent recommendation is how to distinguish the meanings of the same word in different contexts or associate multiple words that express the same meaning. In this paper, we propose a Heterogeneous Topic model exploiting Word embedding to enhance word semantics (HTW). First, we model the relationship among text, inventors, and applicants around the topic to build a heterogeneous topic model and learn the patent feature representation to capture contextual word semantics. Second, a word embedding is constructed to extract the deep semantics for associating multiple words that express the same meaning. Finally, with words as connections, the mapping from patent feature representations to patent embedding is established through a matrix operation, which integrates the information between the word embedding and patent feature representation. HTW considers the heterogeneity of patents and enhances the distinction or association among words simultaneously. The experimental results on real-world datasets show that HTW exceeds typical keyword-based methods, topic models, and embedding models on patent recommendations.
What problem does this paper attempt to address?