Abstract:Recent advances in natural language processing have catalysed active research in designing algorithms to generate contextual vector representations of words, or word embedding, in the machine learning and computational linguistics community. Existing works pay little attention to patterns of words, which encode rich semantic information and impose semantic constraints on a word's context. This paper explores the feasibility of incorporating word embedding with pattern grammar, a grammar model to describe the syntactic environment of lexical items. Specifically, this research develops a method to extract patterns with semantic information of word embedding and investigates the statistical regularities and distributional semantics of the extracted patterns. The major results of this paper are as follows. Experiments on the LCMC Chinese corpus reveal that the frequency of patterns follows Zipf's hypothesis, and the frequency and pattern length are inversely related. Therefore, the proposed method enables the study of distributional properties of patterns in large-scale corpora. Furthermore, experiments illustrate that our extracted patterns impose semantic constraints on context, proving that patterns encode rich semantic and contextual information. This sheds light on the potential applications of pattern-based word embedding in a wide range of natural language processing tasks.

Investigating Language Universal and Specific Properties in Word Embeddings

Enhanced Double-Carrier Word Embedding Via Phonetics and Writing

Improve Word Embedding Using Both Writing and Pronunciation.

Mining Coherent Topics in Documents Using Word Embeddings and Large-Scale Text Data

Field Embedding: A Unified Grain-Based Framework for Word Representation

Visual Exploration and Comparison of Word Embeddings.

From Word Vectors to Multimodal Embeddings: Techniques, Applications, and Future Directions For Large Language Models

Learning Word Embeddings from Intrinsic and Extrinsic Views

Learning Context-Specific Word/Character Embeddings.

An Exploration Of Semantic Relations In Neural Word Embeddings Using Extrinsic Knowledge

A Word Embedding Model for Analyzing Patterns and Their Distributional Semantics

Low-dimensional Semantic Space: from Text to Word Embedding

Evaluating Word Embedding Models: Methods and Experimental Results

Beyond Bilingual: Multi-sense Word Embeddings using Multilingual Context

Analyzing the Surprising Variability in Word Embedding Stability Across Languages

Representation Of Lexical Stylistic Features In Language Models' Embedding Space

Language Embeddings Sometimes Contain Typological Generalizations

Word Embedding Revisited: A New Representation Learning and Explicit Matrix Factorization Perspective.

Language Models are Universal Embedders

Category Enhanced Word Embedding.

Compressing and Interpreting Word Embeddings with Latent Space Regularization and Interactive Semantics Probing