Performing Literature Review Using Text Mining, Part I: Retrieving Technology Infrastructure Using Google Scholar and APIs

Dazhi Yang,Allan N. Zhang,Wenjing Yan
DOI: https://doi.org/10.1109/bigdata.2017.8258313
2017-01-01
Abstract:Technology infrastructure (TechInfra) refers to metadata describing an academic field, such as journals & conferences, authors, publications and organizations. Understanding the TechInfra is often the first step in performing a literature review on a particular topic. In this paper, a study is conducted to retrieve TechInfra for a topic in supply chain management, namely, last mile logistics. Google Scholar is used as the primary tool for data collection. The first 1,000 results returned by Google Scholar are downloaded as HTML files. Subsequently, various application programming interfaces (APIs) - e.g., ScienceDirect, IEEE, CrossRef APIs - are used to enhance the data quality. Some plots are used to provide visualization of TechInfra of last mile logistics.
What problem does this paper attempt to address?