Abstract:Achieving consistent word interpretations across different time spans is crucial in social sciences research and text analysis tasks, as stable semantic representations form the foundation for research and task correctness, enhancing understanding of socio-political and cultural analysis. Traditional models like Word2Vec have provided significant insights into long-term semantic changes but often struggle to capture stable meanings in short-term contexts, which may be attributed to fluctuations in embeddings caused by unbalanced training data. Recent advancements, particularly BERT (Bidirectional Encoder Representations from Transformers), its pre-trained nature and transformer encoder architecture offer contextual embeddings that improve semantic consistency, making it a promising tool for short-term analysis. This study empirically compares the performance of Word2Vec and BERT in maintaining stable word meanings over time in text analysis tasks relevant to social sciences research. Using articles from the People's Daily spanning 20 years (2004-2023), we evaluated the semantic stability of each model across different timeframes. The results indicate that BERT consistently outperforms Word2Vec in maintaining semantic stability, offering greater stability in contextual embeddings. However, the study also acknowledges BERT's limitations in capturing gradual semantic shifts over longer periods due to its inherent stability. The findings suggest that while BERT is advantageous for short-term semantic analysis in social sciences, researchers should consider complementary approaches for long-term studies to fully capture semantic drift. This research underscores the importance of selecting appropriate word embedding models based on the specific temporal context of social science analyses.

Explaining and Improving BERT Performance on Lexical Semantic Change Detection

Effects of Pre- and Post-Processing on type-based Embeddings in Lexical Semantic Change Detection

Integrating Semantic Information into Sketchy Reading Module of Retro-Reader for Vietnamese Machine Reading Comprehension

IMS at SemEval-2020 Task 1: How low can you go? Dimensionality in Lexical Semantic Change Detection

Semantics-aware BERT for Language Understanding.

Optimizing small BERTs trained for German NER

Probing Pretrained Language Models for Lexical Semantics

Achieving Semantic Consistency Using BERT: Application of Pre-training Semantic Representations Model in Social Sciences Research

SBERT-WK: A Sentence Embedding Method by Dissecting BERT-based Word Models

Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings

Graph-based Clustering for Detecting Semantic Change Across Time and Languages

BERTwich: Extending BERT's Capabilities to Model Dialectal and Noisy Text

Performance and sustainability of BERT derivatives in dyadic data

Advancing Domain Adaptation of BERT by Learning Domain Term Semantics.

A Closer Look at How Fine-tuning Changes BERT

Analyzing Semantic Faithfulness of Language Models via Input Intervention on Question Answering

Establishing Strong Baselines for the New Decade: Sequence Tagging, Syntactic and Semantic Parsing with BERT

A Study of Cross-Lingual Ability and Language-specific Information in Multilingual BERT

SChME at SemEval-2020 Task 1: A Model Ensemble for Detecting Lexical Semantic Change

Transfer Fine-Tuning: A BERT Case Study

Evaluation of BERT and ALBERT Sentence Embedding Performance on Downstream NLP Tasks