Abstract:Abstract Natural language processing is one of the most challenging parts in the study of artificial intelligence and is widely used in real-life applications. One of the basic questions is how to calculate the probability of a particular text sequence appearing in a certain context. Word2Vec is a powerful tool that provides a solution to the question for its ability to transform words into word vectors, and to train in high efficiency on large datasets and corpora. It has many models of which Continuous-Bag-Of-Words and Skip-gram are of great significance and also known to many people. Furthermore, some extended techniques related to the models are also proposed in order to simultaneously decrease required training time and increase the rate of accuracy for the training. Even though there are now a number of papers that describe these fundamental concepts, the quality vary greatly. To better understand the models and their extensions, and how well they behave when used for real tasks, different combinations of the models and techniques are made in this paper so as to compare their performance in processing large input data and the ability for correct prediction in the task of text classification. This is done as it could lead to more provision of details and understandings of the model for subsequent researches on this field of study.

Continuous Word Embeddings For Detecting Local Text Reuses At The Semantic Level

A new video text detection method.

Mining Coherent Topics in Documents Using Word Embeddings and Large-Scale Text Data

New Word Identification in Social Network Text Based on Time Series Information

Semantic Word Cloud Generation Based on Word Embeddings

A Semantic Relation Preserved Word Embedding Reuse Method

Detecting new Chinese words from massive domain texts with word embedding

Learning hash codes for efficient content reuse detection.

Fast Extraction of Word Embedding from Q-contexts

Generative Topic Embedding: a Continuous Representation of Documents (Extended Version with Proofs)

Learning Semantic Hierarchies: a Continuous Vector Space Approach

A Mixed Generative-Discriminative Based Hashing Method

Revisiting Embedding Features for Simple Semi-supervised Learning.

Local Word Bag Model for Text Categorization

Continuous-bag-of-words and Skip-gram for word vector training and text classification

Discrete Joint Semantic Alignment Hashing for Cross-Modal Image-Text Search

Exploiting Global Semantic Similarity Biterms for Short-Text Topic Discovery

Fused Text Recogniser and Deep Embeddings Improve Word Recognition and Retrieval

Simple, Interpretable and Stable Method for Detecting Words with Usage Change across Corpora

Research on Chinese Semantic Similarity Algorithm

Efficiently Identifying Watermarked Segments in Mixed-Source Texts