A Survey of Word Embedding Algorithms for Textual Data Information Extraction

Eugen Vušak,Vjeko Kužina,Alan Jović,Eugen Vusak,Vjeko Kuzina,Alan Jovic
DOI: https://doi.org/10.23919/mipro52101.2021.9597076
2021-09-27
Abstract:Unlike other popular data types, such as images, textual data cannot be easily converted into a numerical form that machine learning algorithms can process. Therefore, text must be embedded into a vector space using embedding algorithms. These algorithms attempt to encapsulate as much information as possible from the text into a resulting vector space. Natural language is complex and contains numerous layers of information. Information can be obtained from a sequence of characters or subword units that make up the word. It can also be derived from the context in which a word occurs. For this reason, a variety of word embedding algorithms have been developed over time, which use different pieces of information in different ways. In this paper, the currently available word embedding algorithms are described and it is shown what kind of information these algorithms use. After analyzing these algorithms, we discuss how it can be advantageous to use combinations of different types of information in different research and application areas.
What problem does this paper attempt to address?