A Classification Method for Chinese Word Semantic Relations Based on TF-IDF and CNN.

Teng Mao,Yuanyuan Peng,Yuru Jiang,Yangsen Zhang
DOI: https://doi.org/10.1007/978-3-030-04015-4_43
2018-01-01
Abstract:The classification of semantic relations between words is an important part of semantic analysis in natural language research. The automatic achievement of this classification is of significance to construction of the Knowledge Graph and Information Retrieval. In NLPCC2017 shared task on Chinese Word Semantic Relations Classification, the semantic relations have been classified into four categories: synonym, antonym, hyponymy and meronym. This paper presents a classification method for Chinese word semantic relations based on TF-IDF and CNN, and uses words’ literal and semantic features. Four new literal features are proposed including whether a word is part of another word and the ratio of their common substring. The extraction of semantic features is a four-step process— training a vector model of words on BaiduBaike Corpus, selecting a set of words most related to a given word from BaiduBaike based on TF-IDF, constructing a vector matrix for the set of related words, and using CNN to get the semantic features of the given word from the vector matrix. The experiment on the NLPCC2017 dataset demonstrates that the F1-score is up to 83.91%, which proves effective to eliminate the influence of the OOV words.
What problem does this paper attempt to address?