Morpheme-based Korean text cohesion analyzer

Dong-Hyun Kim,Seokho Ahn,Euijong Lee,Young-Duk Seo
DOI: https://doi.org/10.1016/j.softx.2024.101659
IF: 2.868
2024-02-16
SoftwareX
Abstract:The fundamental difference between Korean and English text analysis lies in morpheme analysis. While existing Korean text analysis relies on English analysis tools, it often yields inaccurate results due to the difficulty of morpheme analysis. The primary reason is the existing morpheme analyzer depends on eojeol tokens, making it challenging to grasp Korean characteristics. Therefore, we introduce a Transformer-based morpheme analyzer that uses morpheme tokens to capture the inherent feature in Korean sentences. Then, we successfully integrate this morpheme analyzer into our Korean text analysis tool, offering it as a web service for efficient usage.
computer science, software engineering
What problem does this paper attempt to address?