FSDA: Tackling Tail-Event Analysis in Imbalanced Time Series Data with Feature Selection and Data Augmentation

Raphael Krief,Eric Benhamou,Beatrice Guez,Jean-Jacques Ohana,David Saltiel,Rida Laraki,Jamal Atif
DOI: https://doi.org/10.2139/ssrn.4557797
2023-01-01
SSRN Electronic Journal
Abstract:Efficient management of imbalanced time series data is of paramount importance when data located in the tails, particularly extreme values, have a substantial influence on predictive outcomes. This paper introduces FSDA (Feature Selection and Data Augmentation), a combined approach of feature selection and data augmentation, to address this issue. FSDA aims to identify the most predictive features for tail data, which may exhibit different sensitivities compared to the rest of the dataset. Data augmentation, a conventional technique for handling imbalanced data, is employed to enhance the accuracy of machine learning regression methods. Augmented information is strategically incorporated using time-warping and drift methods to maintain the temporal integrity of the data. Empirical evidence based on a use case in financial data reveals that FSDA consistently outperforms feature selection (FS) and data augmentation (DA) methods across all percentiles ranging from 85 to 99, demonstrating its efficacy in managing imbalanced time series data and improving predictive accuracy.
What problem does this paper attempt to address?