Abstract:Consider a question and its answers in Stack Overflow as a knowledge unit. Knowledge units often contain semantically relevant knowledge, and thus linkable for different purposes, such as duplicate questions, directly linkable for problem solving, indirectly linkable for related information. Recognising different classes of linkable knowledge would support more targeted information needs when users search or explore the knowledge base. Existing methods focus on binary relatedness (i.e., related or not), and are not robust to recognize different classes of semantic relatedness when linkable knowledge units share few words in common (i.e., have lexical gap). In this paper, we formulate the problem of predicting semantically linkable knowledge units as a multiclass classification problem, and solve the problem using deep learning techniques. To overcome the lexical gap issue, we adopt neural language model (word embeddings) and convolutional neural network (CNN) to capture word- and document-level semantics of knowledge units. Instead of using human-engineered classifier features which are hard to design for informal user-generated content, we exploit large amounts of different types of user-created knowledge-unit links to train the CNN to learn the most informative wordlevel and document-level features for the multiclass classification task. Our evaluation shows that our deep-learning based approach significantly and consistently outperforms traditional methods using traditional word representations and human-engineered classifier features.

Embedding for Words and Word Senses Based on Human Annotated Knowledge Base: A Case Study on HowNet

Incorporating Knowledge into Neural Network for Text Representation.

Predicting Semantically Linkable Knowledge In Developer Online Forums Via Convolutional Neural Network

Do Multi-Sense Embeddings Improve Natural Language Understanding?

An Exploration Of Semantic Relations In Neural Word Embeddings Using Extrinsic Knowledge

Leveraging Human Prior Knowledge to Learn Sense Representations

Try to Substitute: an Unsupervised Chinese Word Sense Disambiguation Method Based on HowNet.

KNET: A General Framework for Learning Word Embedding using Morphological Knowledge

OpenHowNet: An Open Sememe-based Lexical Knowledge Base.

Learning Sense-specific Word Embeddings By Exploiting Bilingual Resources.

Chinese Word Sense Embedding with SememeWSD and Synonym Set

Contextualized Word Embeddings Encode Aspects of Human-Like Word Sense Knowledge

Sememe Knowledge Computation: a Review of Recent Advances in Application and Expansion of Sememe Knowledge Bases

Research on the Modeling of Semantic-Based Web Resources Feature.

Language Modelling Makes Sense: Propagating Representations through WordNet for Full-Coverage Word Sense Disambiguation

Lexical Sememe Prediction Via Word Embeddings and Matrix Factorization.

Constructing High Quality Sense-specific Corpus and Word Embedding Via Unsupervised Elimination of Pseudo Multi-sense.

Together We Make Sense -- Learning Meta-Sense Embeddings from Pretrained Static Sense Embeddings

Enhancing Semantic Word Representations by Embedding Deeper Word Relationships

xSense: Learning Sense-Separated Sparse Representations and Textual Definitions for Explainable Word Sense Networks

Distance Based Korean WordNet(alias. KorLex) Embedding Model