XLM-RoBERTa Model for Key Information Extraction on Military Document

Salman Shalahuddin,Jonner Hutahaean,Heru Permana,Urip Teguh Setijohatmo,Muhammad Faza Ilmanuddiin Rachmadi
DOI: https://doi.org/10.1109/ICISS59129.2023.10291826
2023-09-06
Abstract:Key Information Extraction (KIE) is the process of extracting important data in a text or an image, such as the content of a military document. One popular method for performing KIE is by utilizing pattern matching with Regular Expression on texts from documents. However, this method is not efficient due to the fixed nature of RegEx, where we need an exclusive RegEx rule to extract certain data entities. To solve this flexibility issue, a deep learning method with Named Entity Recognition (NER) has been proposed. NER is applied to the scanned document using a transformer-based model called XLM-RoBERTa. This model has been chosen for the development of the KIE module as it supports Indonesian and has demonstrated good accuracy in previous studies. The model has been trained on a military document dataset in the Indonesian language, achieving an f1-score of 86.55%. This method has also been proven to be flexible, as the model can extract the same entities across different document types.
Computer Science
What problem does this paper attempt to address?