Information Retrieval Technique for Indonesian PDF Document with Modified Stemming Porter Method Using PHP

Faizal Riza,Saefulloh Rifai,Akmal Dirgantara,Sfenrianto,Rasenda,Syarifudin Herdyansyah
DOI: https://doi.org/10.1088/1742-6596/1477/3/032016
2020-03-01
Journal of Physics: Conference Series
Abstract:Abstract Finding relevant information from a collection of information requires a process of stemming. Stemming is the process of combining or solving each morphological variants of a word into a basic word. Based on the basic structure of the word morphology, Porter’s stemming looks appropriate to be applied in conducting basic word searches in Indonesian-language documents, but with a few modifications. For this need, an Information Retrieval Technique for Indonesian PDF Document Application Using PHP from Indonesian documents is made using the Modified Stemming Porter Method. Implementation of the application was carried out using the Php (Hypertext Preprocessor) programming language. Testing was performed on 26 pdf e-book documents are 23,197 basic words out of 28,532 total words. the experiment found 94% as the largest percentage of precision words in the document. And the results obtained 81% as the lowest percentage of the basic words that are precise in the document. The results obtained from the test are that the application can operate well in conducting stemming on e-books in Indonesian.
What problem does this paper attempt to address?