On the performance of phonetic algorithms in microtext normalization

Yerai Doval,Manuel Vilares,Jesús Vilares
DOI: https://doi.org/10.1016/j.eswa.2018.07.016
2024-02-05
Abstract:User-generated content published on microblogging social networks constitutes a priceless source of information. However, microtexts usually deviate from the standard lexical and grammatical rules of the language, thus making its processing by traditional intelligent systems very difficult. As an answer, microtext normalization consists in transforming those non-standard microtexts into standard well-written texts as a preprocessing step, allowing traditional approaches to continue with their usual processing. Given the importance of phonetic phenomena in non-standard text formation, an essential element of the knowledge base of a normalizer would be the phonetic rules that encode these phenomena, which can be found in the so-called phonetic algorithms.
Computation and Language
What problem does this paper attempt to address?