Toto: Time Series Optimized Transformer for Observability

Ben Cohen,Emaad Khwaja,Kan Wang,Charles Masson,Elise Ramé,Youssef Doubli,Othmane Abou-Amal
2024-07-12
Abstract:This technical report describes the Time Series Optimized Transformer for Observability (Toto), a new state of the art foundation model for time series forecasting developed by Datadog. In addition to advancing the state of the art on generalized time series benchmarks in domains such as electricity and weather, this model is the first general-purpose time series forecasting foundation model to be specifically tuned for observability metrics. Toto was trained on a dataset of one trillion time series data points, the largest among all currently published time series foundation models. Alongside publicly available time series datasets, 75% of the data used to train Toto consists of fully anonymous numerical metric data points from the Datadog platform. In our experiments, Toto outperforms existing time series foundation models on observability data. It does this while also excelling at general-purpose forecasting tasks, achieving state-of-the-art zero-shot performance on multiple open benchmark datasets.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
This paper mainly introduces a new type of time series optimization Transformer model called "Toto", which is designed for the prediction of observability metrics. Toto is currently the largest basic model for time series, trained on 1 trillion data points, with 75% of the data coming from anonymous numerical metric data on the Datadog platform. Toto performs well in handling observational data and also achieves state-of-the-art zero-shot performance in general time series prediction tasks. It introduces three key innovations: 1. Proportional factorized spatiotemporal attention mechanism, which effectively groups multivariate time series features, reducing computational burden while maintaining high precision. 2. Student-T mixture model head, which enhances the capturing of complex dynamics in time series through probabilistic modeling, surpassing traditional methods. 3. Domain-specific training data: In addition to multi-domain time series data, Toto has also been specifically pre-trained on Datadog observability metrics, enhancing its ability to predict time series with unique characteristics. The paper demonstrates that Toto outperforms existing basic time series models in observational data and achieves the best zero-shot prediction performance on multiple open benchmark datasets. Toto's architectural design considers real-time analysis and efficient scalability of large-scale data, making it particularly suitable for handling high-frequency and high-dimensional data, which are common in observability metrics. In addition, the paper discusses the limitations of traditional models such as ARIMA and exponential smoothing, and how Transformer models can become powerful tools for time series prediction through pre-training. Toto addresses challenges in observational data such as high temporal resolution, sparsity, extreme dynamic range, and non-stationarity through its unique attention mechanism and probabilistic prediction head, providing more accurate and efficient predictions.