Scraping the ACM Digital Library

Donna Bergmark,Paradee Phempoonpanich,Shumin Zhao
DOI: https://doi.org/10.1145/511144.511146
2001-09-01
ACM SIGIR Forum
Abstract:As part of a larger project to automatically reference link the online scholarly literature, an attempt to analyze PDF documents was undertaken. The ACM Digital Library was used as the corpus for these experiments. With the current PDF and HTML analysis tools, roughly 80% accuracy was obtained in the automatic extraction of reference linking information.
What problem does this paper attempt to address?