Effects of evolutionary linguistics in text classification

Julia Efremova,Alejandro Montes García,Jianpeng Zhang,Toon Calders
DOI: https://doi.org/10.1007/978-3-319-25789-1_6
2015-01-01
Abstract:We perform an empirical study to explore the role of evolutionary linguistics on the text classification problem. We conduct experiments on a real-world collection with more than 100.000 Dutch historical notary acts. The document collection spans over six centuries. During such a large time period some lexical terms modified significantly. Person names, professions and other information changed over time as well. Standard text classification techniques which ignore temporal information of the documents might not produce the most optimal results in our case. Therefore, we analyse the temporal aspects of the corpus. We explore the effect of training and testing the model on different time periods. We use time periods that correspond to the main historical events and also apply clustering techniques in order to create time periods in a data driven way. All experiments show a strong time-dependency of our corpus. Exploiting this dependence, we extend standard classification techniques by combining different models trained on particular time periods and achieve overall accuracy above $$90\\,\\%$$ and macro-average indicators aboveï¾ź63ï¾ź%.
What problem does this paper attempt to address?