TS-Diffusion: Generating Highly Complex Time Series with Diffusion Models

Yangming Li
2023-11-07
Abstract:While current generative models have achieved promising performances in time-series synthesis, they either make strong assumptions on the data format (e.g., regularities) or rely on pre-processing approaches (e.g., interpolations) to simplify the raw data. In this work, we consider a class of time series with three common bad properties, including sampling irregularities, missingness, and large feature-temporal dimensions, and introduce a general model, TS-Diffusion, to process such complex time series. Our model consists of three parts under the framework of point process. The first part is an encoder of the neural ordinary differential equation (ODE) that converts time series into dense representations, with the jump technique to capture sampling irregularities and self-attention mechanism to handle missing values; The second component of TS-Diffusion is a diffusion model that learns from the representation of time series. These time-series representations can have a complex distribution because of their high dimensions; The third part is a decoder of another ODE that generates time series with irregularities and missing values given their representations. We have conducted extensive experiments on multiple time-series datasets, demonstrating that TS-Diffusion achieves excellent results on both conventional and complex time series and significantly outperforms previous baselines.
Machine Learning
What problem does this paper attempt to address?
The paper attempts to address three common adverse characteristics encountered when generating complex time series data: irregular sampling, missing values, and high feature-time dimensions. Specifically, existing time series generation models either make strong assumptions about the data format (e.g., regularity) or rely on preprocessing methods (e.g., interpolation) to simplify the raw data. The paper proposes a new model called TS-Diffusion, which aims to handle highly complex time series data with the aforementioned three adverse characteristics without preprocessing. The TS-Diffusion model consists of three parts: 1. **Encoder**: Converts the time series into dense representations based on neural ordinary differential equations (ODE), capturing irregular sampling using jump techniques and handling missing values through self-attention mechanisms. 2. **Diffusion Model**: Learns from the dense representations of the time series. Due to the high dimensionality of the data, traditional generative models are difficult to handle, hence the use of a diffusion model. 3. **Decoder**: Another continuous-time decoder based on neural ODE, used to generate time series with irregularities and missing values from the representations. Experimental results show that TS-Diffusion performs excellently on various time series datasets, significantly outperforming existing baseline models, especially on complex time series data such as medical records.