Military Web Information Extraction Incorporating Resource Metadata Distribution

Hui Li,Jiahao Zhou,Wei Sun
DOI: https://doi.org/10.1109/insai56792.2022.00019
2022-01-01
Abstract:We analyze the relationship between the metadata distribution of military resources and the distribution of the web content in news, events, and resources in three major sections of military web stations, and thus we propose a method for extracting information from military web pages by integrating the metadata distribution and the basic features of military web pages to increase the difference between the content density and the noisy text density, to extract the content of military web pages. To verify the practical effectiveness of the proposed method in extracting information from military web pages, 69,379 web pages in the three major sections were randomly selected for the experiments in this work. The experimental results show that compared with the CETR, VIPS, SRV, and WNISK algorithms, the proposed method in this paper achieves better content extraction performance, with F-values of 98.21%, 97.11%, and 97.46% for the extraction of web page information in the three major sections respectively.
What problem does this paper attempt to address?