Identifying epochs in text archives

Tobias Blanke,Jon Wilson
DOI: https://doi.org/10.1109/bigdata.2017.8258172
2017-12-01
Abstract:This paper develops an automated approach to the ‘distant reading’ of textual archives in order to classify epochs in the use of language and examine their particular characteristic. It classifies epochs by applying a series of standardised dictionaries to map the semantics of government documents, using the changing frequency of terms in these dictionaries to identify moments of rupture in language. It then tests a variety of techniques to chart the relationship between the changing shape of individual linguistic elements and aggregate patterns, particularly topic models and word2vec word embeddings. The result are a set of largely automated tools for understanding the structure of digital textual archives.
What problem does this paper attempt to address?