Prediction of adverse events in Afghanistan: regression analysis of time series data grouped not by geographic dependencies

Krzysztof Fiok,Waldemar Karwowski,Maciej Wilamowski
DOI: https://doi.org/10.48550/arXiv.2002.12211
2020-02-27
Abstract:The aim of this study was to approach a difficult regression task on highly unbalanced data regarding active theater of war in Afghanistan. Our focus was set on predicting the negative events number without distinguishing precise nature of the events given historical data on investment and negative events per each of predefined 400 Afghanistan districts. In contrast with previous research on the matter, we propose an approach to analysis of time series data that benefits from non-conventional aggregation of these territorial entities. By carrying out initial exploratory data analysis we demonstrate that dividing data according to our proposal allows to identify strong trend and seasonal components in the selected target variable. Utilizing this approach we also tried to estimate which data regarding investments is most important for prediction performance. Based on our exploratory analysis and previous research we prepared 5 sets of independent variables that were fed to 3 machine learning regression models. The results expressed by mean absolute and mean square errors indicate that leveraging historical data regarding target variable allows for reasonable performance, however unfortunately other proposed independent variables does not seem to improve prediction quality.
Machine Learning
What problem does this paper attempt to address?