Abstract:Sememe is the minimum unambiguous semantic unit in human language. The semantics of word senses are encoded and expressed by sememe trees in sememe knowledge base. Sememe knowledge benefits many NLP tasks. But it is time-consuming to construct the sememe knowledge base manually. There is one existing work that slightly involves sememe tree prediction, but there are two limitations. The first is they use the word as the unit instead of the word sense. The second is that their method only deals with words with dictionary definitions, not all words. In this article, we use English and Chinese bilingual information to help disambiguate word sense. We propose the Chinese and English bilingual sememe tree prediction task which can automatically extend the famous knowledge base HowNet. And we propose two methods. For a given word pair with categorial sememe, starting from the root node, the first method uses neural networks to gradually generate edges and nodes in a depth-first order. The second is a recommended method. For a given word pair with categorial sememe, we find some word pairs with the same categorial sememe and semantically similar to it, and construct a propagation function to transfer sememe tree information of these word pairs to the word pair to be predicted. Experiments show that our method has a significant effect of F1 84.0%. Further, we use the Oxford English-Chinese Bilingual Dictionary as data and add about 90,000 word pairs to HowNet, nearly expanding HowNet by half.

Research On Semantic Disambiguation In Treebank

Parsing-based Chinese word segmentation integrating morphological and syntactic information

Chinese Semantic Dependency Relation System and Treebank Construction.

Chinese Dependency Parsing Based on Treebank

Exploiting Heterogeneous Treebanks for Parsing.

Research on the Application of a Chinese Semantic Knowledge Base in Chinese Phrase Disambiguation

Improving Chinese Dependency Parsing with Lexical Semantic Features

Semantic Disambiguation of Chinese Homonyms in pinyin-hanzi Conversion

Semantic Parsing for English as a Second Language

Lexical Issues in Chinese Information Processing:in the Background of Sentence-based Diagram Treebank Construction

A Language Model for Word Sense Disambiguation

Application of the transformer model algorithm in chinese word sense disambiguation: a case study in chinese language

Improved parsing with taxonomy of conjunctions

Towards Accurate and Efficient Chinese Part-of-Speech Tagging.

Improve Chinese Semantic Dependency Parsing Via Syntactic Dependency Parsing

Two Language Models Using Chinese Semantic Parsing

Sememe Tree Prediction for English-Chinese Word Pairs.

Reducing Approximation and Estimation Errors for Chinese Lexical Processing with Heterogeneous Annotations

Chinese WSD based on features obtaining with shallow parsing

Learning Semantic Neural Tree for Human Parsing

Dependency-Gated Cascade Biaffine Network for Chinese Semantic Dependency Graph Parsing