Robust Datamining

Uwe Aickelin
DOI: https://doi.org/10.2139/ssrn.3070070
2017-01-01
SSRN Electronic Journal
Abstract:Our long-term research goal is to develop datamining methodologies that are robust to changes in data and uncertainty. By robust we mean solutions remain ‘optimal’ when things change or are easily repaired. Broadly, this robustness can be achieved in two ways: One, by having ‘slack’ in the solution or two, by constructing the solution such that is easily repairable, e.g. failures are isolated. Uncertainty in datamining can be introduced in many ways. Some of it can be due to unreliable data collecting, noisy data or simply continuous real-time and changing data streams. However, the part of uncertainty most of interest to us is that introduced by the human angle. For instance, we know from past research that the same experts make different decision based on the same data when approached a month later. We also hypothesise that under certain conditions people change their behaviour or strategies, e.g. from co-operating to competing. In the field of optimisation, robustness has previously been explored extensively and there are some mature approaches such as stochastic programming. In the field of datamining, this is a newer concept and only some basic approaches exist, like robust Principal Component Analysis. A completely novel approach could be a semi-supervised ‘uncertainty coefficient’ algorithm.
What problem does this paper attempt to address?