Modeling document labels using Latent Dirichlet allocation for archived documents in Integrated Quality Assurance System (IQAS)
Freddie Prianes,Thelma Palaoag
DOI: https://doi.org/10.12688/f1000research.130245.1
2023-01-27
F1000Research
Abstract:Background: As part of the transition of every higher education institution into an intelligent campus here in the Philippines, the Commission of Higher Education has launched a program for the development of smart campuses for state universities and colleges to improve operational efficiency in the country. With regards to the commitment of Camarines Sur Polytechnic Colleges in improving the accreditation operation and to resolve the evident problems in the accreditation process, the researchers propose this study as part of an Integrated Quality Assurance System that aims to develop an intelligent model that will be used in categorizing and automating tagging of archived documents used during accreditation. Methods: As a guide in modeling the study, the researchers use an agile method as it promotes flexibility, speed, and, most importantly, continuous improvement in developing, testing, documenting, and even after delivery of the software. This method helped the researchers in designing the prototype with the implementation of the said model to aid the process in file searching and label tagging. Moreover, a computational analysis is also included to further understand the result from the devised model. Results: As a result, from the processed sample corpus, the document labels are faculty, activities, library, research, and materials. The labels generated are based on the total relative frequencies which are 0.009884, 0.008825, 0.007413, 0.007413, 0.006354, respectively, that have been computed between the ratio on how many times the term was used in the document and the total word count of the whole document. Conclusions: The devised model and prototype support the organization in file storing and categorization of accreditation documents. Through this, it is easier to retrieve and classify the data, which is the main problem for the task group. Further, other patterns in clustering, modeling, and text classification can be integrated in the prototype.