Natural language processing: A window to understanding skincare trends

Jack A Cummins,Vinod E Nambudiri
DOI: https://doi.org/10.1016/j.ijmedinf.2022.104705
Abstract:Background: Reddit is a popular social media discussion forum. Reddit data can be analyzed with natural language processing techniques to gain insights into public health questions by tracking frequency of discussion on relevant topics over time and analysis of discussion content. Objectives: To apply natural language processing techniques to categorize, track, and gain insights from comments regarding skincare-related topics on Reddit using sentiment analysis and word search techniques. Material and methods: Historical Reddit comments available on Google BigQuery from the r/SkincareAddiction subreddit were selected and preprocessed. Latent Dirichlet Allocation was applied to create topics. Selected topics were further investigated for interest over time, by determining comment frequencies of words of interest. Sentiment analysis was also applied to each topic. Results: >3,000,000 comments were analyzed and classified into 25 topics. Topics related to sunscreen, diet, and exfoliants were examined for response frequencies over time, demonstrating seasonal variation. Taking comment frequencies demonstrated peaks containing "coral" and "oxybenzone" that corresponded to media coverage of sunscreen-associated coral bleaching. Queries containing "physical" and "mineral" demonstrated an evolution in word choice describing physical/mineral sunscreens over time. Sentiment analysis demonstrated a range from mildly positive to moderately positive sentiment across the five examined skincare topics. Limitations: Our analysis was limited to one subreddit category. Additionally, Latent Dirichlet Allocation is an unsupervised model; its accuracy cannot be readily assessed. Taking comment frequencies for words, while powerful, cannot be used to find word trends that are not intentionally queried by the user. Conclusions: Natural language processing is a powerful tool to examine large dermatology discussion forums and gain insights into patient perceptions of the field.
What problem does this paper attempt to address?