KSW: Khmer Stop Word based Dictionary for Keyword Extraction

Nimol Thuon,Wangrui Zhang,Sada Thuon
DOI: https://doi.org/10.48550/arXiv.2405.17390
2024-05-28
Abstract:This paper introduces KSW, a Khmer-specific approach to keyword extraction that leverages a specialized stop word dictionary. Due to the limited availability of natural language processing resources for the Khmer language, effective keyword extraction has been a significant challenge. KSW addresses this by developing a tailored stop word dictionary and implementing a preprocessing methodology to remove stop words, thereby enhancing the extraction of meaningful keywords. Our experiments demonstrate that KSW achieves substantial improvements in accuracy and relevance compared to previous methods, highlighting its potential to advance Khmer text processing and information retrieval. The KSW resources, including the stop word dictionary, are available at the following GitHub repository: (<a class="link-external link-https" href="https://github.com/back-kh/KSWv2-Khmer-Stop-Word-based-Dictionary-for-Keyword-Extraction.git" rel="external noopener nofollow">this https URL</a>).
Information Retrieval,Computation and Language
What problem does this paper attempt to address?