Temporal Analysis of Literary and Programming Prose

Brian Michalski,Mukkai Krishnamoorthy,Tsz-Yam Lau
DOI: https://doi.org/10.48550/arXiv.1202.2131
2012-02-10
Abstract:Literary works reference a variety of globally shared themes including well-known people, events, and time periods. It is particularly interesting to locate patterns that are either invariant across time or exhibit a characteristic change across time, as they could imply something important about society that those works record. This paper suggests the use of Google n-gram viewer as a fast prototyping method for examining time-based properties over a rich sample of literary prose. Using this method, we find that some repeating periods of time, like Sunday, are referenced disproportionally, allowing us to pose questions such as why a day like Thursday is so unpopular. Furthermore, by treating software as a work of prose, we can apply a similar analysis to open-source software repositories and explore time-based relations in commit logs. Doing a simple statistical analysis on a few temporal keywords in the log records, we reinforce and weaken a few beliefs on how college students approach open source software. Finally, we help readers working on their own temporal analysis by comparing the fundamental differences between literary works and code repositories, and suggest blog or wiki as recently-emerging works.
Software Engineering,Digital Libraries
What problem does this paper attempt to address?