Text data-augmentation using Text Similarity with Manhattan Siamese long short-term memory for Thai language

Thananya Phreeraphattanakarn,Boonserm Kijsirikul
DOI: https://doi.org/10.1088/1742-6596/1780/1/012018
2021-02-01
Journal of Physics: Conference Series
Abstract:Abstract In this paper, we address the issue of using small text datasets for learning of neural networks. We explore the method that is used with image and sound datasets to augment data for increasing the performance of models. We then leverage this data augmentation technique to expand the training set of textual data. A great challenge in our dataset is that the amount of data is insufficient for training models. For this reason, we propose a method for augmenting text data specifically for Thai language which is based on Text Similarity and using the model to determine the semantic relationship between two sentences. The experimental results indicated that our proposed method is able to improve the performance of text classification.
What problem does this paper attempt to address?