Abstract:Background Natural language processing (NLP) is an important traditional field in computer science, but its application in medical research has faced many challenges. With the extensive digitalization of medical information globally and increasing importance of understanding and mining big data in the medical field, NLP is becoming more crucial. Objective The goal of the research was to perform a systematic review on the use of NLP in medical research with the aim of understanding the global progress on NLP research outcomes, content, methods, and study groups involved. Methods A systematic review was conducted using the PubMed database as a search platform. All published studies on the application of NLP in medicine (except biomedicine) during the 20 years between 1999 and 2018 were retrieved. The data obtained from these published studies were cleaned and structured. Excel (Microsoft Corp) and VOSviewer (Nees Jan van Eck and Ludo Waltman) were used to perform bibliometric analysis of publication trends, author orders, countries, institutions, collaboration relationships, research hot spots, diseases studied, and research methods. Results A total of 3498 articles were obtained during initial screening, and 2336 articles were found to meet the study criteria after manual screening. The number of publications increased every year, with a significant growth after 2012 (number of publications ranged from 148 to a maximum of 302 annually). The United States has occupied the leading position since the inception of the field, with the largest number of articles published. The United States contributed to 63.01% (1472/2336) of all publications, followed by France (5.44%, 127/2336) and the United Kingdom (3.51%, 82/2336). The author with the largest number of articles published was Hongfang Liu (70), while Stéphane Meystre (17) and Hua Xu (33) published the largest number of articles as the first and corresponding authors. Among the first author’s affiliation institution, Columbia University published the largest number of articles, accounting for 4.54% (106/2336) of the total. Specifically, approximately one-fifth (17.68%, 413/2336) of the articles involved research on specific diseases, and the subject areas primarily focused on mental illness (16.46%, 68/413), breast cancer (5.81%, 24/413), and pneumonia (4.12%, 17/413). Conclusions NLP is in a period of robust development in the medical field, with an average of approximately 100 publications annually. Electronic medical records were the most used research materials, but social media such as Twitter have become important research materials since 2015. Cancer (24.94%, 103/413) was the most common subject area in NLP-assisted medical research on diseases, with breast cancers (23.30%, 24/103) and lung cancers (14.56%, 15/103) accounting for the highest proportions of studies. Columbia University and the talents trained therein were the most active and prolific research forces on NLP in the medical field.

Comparison among Four Prominent Text Processing Tools

Medical Language Processing Technology and Application

Data Processing and Text Mining Technologies on Electronic Medical Records: A Review

Comparative Analysis of Open-Source Language Models in Summarizing Medical Text Data

Comparative Analysis of Text Classification Approaches in Electronic Health Records

Performance Evaluation of Structured and Semi-Structured Bioinformatics Tools: A Comparative Study

A review for comparative text mining: From data acquisition to practical application

Re-Structuring and Specific Similarity Computation of Electronic Medical Records

Neural Natural Language Processing for Unstructured Data in Electronic Health Records: a Review

A Unified Review of Deep Learning for Automated Medical Coding

Natural Language Processing Technologies in Radiology Research and Clinical Applications.

Automatically Structuring on Chinese Ultrasound Report of Cerebrovascular Diseases Via Natural Language Processing

Optimizing healthcare system by amalgamation of text processing and deep learning: a systematic review

A Survey on Deep Text Matching

Natural Language Processing in medicine and ophthalmology: A review for the 21st-century clinician

Making effective use of healthcare data using data-to-text technology

Natural Language Processing for Smart Healthcare

Artificial intelligence accelerates multi-modal biomedical process: A Survey

Systematic Evaluation of Research Progress on Natural Language Processing in Medicine Over the Past 20 Years: Bibliometric Study on PubMed

A Study on NLP Based Approach in AI and Text Data Mining for Automated Highlighting of New Information in Clinical Notes

A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques