A Multi-Model-Based Approach to Corpus Text Pre-Screening.

Zuhao Wu,Yude Bi
DOI: https://doi.org/10.1109/IALP61005.2023.10337320
2023-01-01
Abstract:The construction of a self-built corpus for a research topic necessitates the prescreening of acquired texts. During the construction process, certain acquired texts may not be directly relevant to the research subject due to various influencing factors. While both manual and algorithmic methods can be employed for pre-screening, manual screening becomes impractical when dealing with a large volume of text. As such, we propose a multi-model-based approach utilizing TextRank, TF-IDF, and KNN algorithms for pre-screening corpus texts. And the effectiveness of this method will be validated through rigorous evaluation.
What problem does this paper attempt to address?