Text2Time: Transformer-based Article Time Period Prediction

Karthick Prasad Gunasekaran,B Chase Babrich,Saurabh Shirodkar,Hee Hwang
DOI: https://doi.org/10.13140/RG.2.2.29195.36641
2023-04-24
Abstract:The task of predicting the publication period of text documents, such as news articles, is an important but less studied problem in the field of natural language processing. Predicting the year of a news article can be useful in various contexts, such as historical research, sentiment analysis, and media monitoring. In this work, we investigate the problem of predicting the publication period of a text document, specifically a news article, based on its textual content. In order to do so, we created our own extensive labeled dataset of over 350,000 news articles published by The New York Times over six decades. In our approach, we use a pretrained BERT model fine-tuned for the task of text classification, specifically for time period prediction.This model exceeds our expectations and provides some very impressive results in terms of accurately classifying news articles into their respective publication decades. The results beat the performance of the baseline model for this relatively unexplored task of time prediction from text.
Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to predict the publication period of text documents (such as news articles). Specifically, the authors focus on how to predict the decade of publication based on the text content of news articles. This problem is an important but less - studied topic in the field of natural language processing. Predicting the publication year of news articles has important application value in fields such as historical research, sentiment analysis, and media monitoring. For example, in historical research, predicting the publication year of news articles can help scholars better understand the events in a specific period and their impacts on society, culture, and politics; in sentiment analysis, it can provide insights into the evolution of public opinion over time; in media monitoring, it helps to track the trends and patterns of media reports, as well as verify the authenticity of old news articles and prevent the spread of false information. To explore this problem, the authors created a dataset containing more than 350,000 news articles published in The New York Times over a period of 60 years and fine - tuned a pre - trained BERT model to achieve time prediction in the text classification task. The experimental results show that the accuracy of this model on the test data reached 82%, far exceeding the performance of the baseline model. Through this research, the authors not only provided a new method for solving the problem of predicting the publication time of texts, but also deeply explored the performance differences of the model on different types of articles, as well as the factors that may affect the prediction accuracy of the model, such as the use of keywords, article length, etc.