PubSqueezer: A Text-Mining Web Tool to Transform Unstructured Documents into Structured Data

Alberto Calderone
DOI: https://doi.org/10.48550/arXiv.2011.03123
2020-11-09
Abstract:The amount of scientific papers published every day is daunting and constantly increasing. Keeping up with literature represents a challenge. If one wants to start exploring new topics it is hard to have a big picture without reading lots of articles. Furthermore, as one reads through literature, making mental connections is crucial to ask new questions which might lead to discoveries. In this work, I present a web tool which uses a Text Mining strategy to transform large collections of unstructured biomedical articles into structured data. Generated results give a quick overview on complex topics which can possibly suggest not explicitly reported information. In particular, I show two Data Science analyses. First, I present a literature based rare diseases network build using this tool in the hope that it will help clarify some aspects of these less popular pathologies. Secondly, I show how a literature based analysis conducted with PubSqueezer results allows to describe known facts about SARS-CoV-2. In one sentence, data generated with PubSqueezer make it easy to use scientific literate in any computational analysis such as machine learning, natural language processing etc. Availability: <a class="link-external link-http" href="http://www.pubsqueezer.com" rel="external noopener nofollow">this http URL</a>
Information Retrieval,Computation and Language,Quantitative Methods
What problem does this paper attempt to address?