PMC text mining subset in BioC: 2.3 million full text articles and growing

Donald C. Comeau,Chih-Hsuan Wei,Rezarta Islamaj Doğan,Zhiyong Lu
DOI: https://doi.org/10.48550/arXiv.1804.05957
2018-04-17
Abstract:Interest in full text mining biomedical research articles is growing. NCBI provides the PMC Open Access and Author Manuscript sets of articles which are available for text mining. We have made all of these articles available in BioC, an XML and JSON format which is convenient for sharing text, annotations, and relations. These articles are available both via ftp for bulk download and via a Web API for updates or more focused collection. Availability: <a class="link-external link-https" href="https://www.ncbi.nlm.nih.gov/research/bionlp/APIs/BioC-PMC/" rel="external noopener nofollow">this https URL</a>
Digital Libraries
What problem does this paper attempt to address?