Abstract:This paper addresses learning end-to-end models for time series data that include a temporal alignment step via dynamic time warping (DTW). Existing approaches to differentiable DTW either differentiate through a fixed warping path or apply a differentiable relaxation to the min operator found in the recursive steps used to solve the DTW problem. We instead propose a DTW layer based around bi-level optimisation and deep declarative networks, which we name DecDTW. By formulating DTW as a continuous, inequality constrained optimisation problem, we can compute gradients for the solution of the optimal alignment (with respect to the underlying time series) using implicit differentiation. An interesting byproduct of this formulation is that DecDTW outputs the optimal warping path between two time series as opposed to a soft approximation, recoverable from Soft-DTW. We show that this property is particularly useful for applications where downstream loss functions are defined on the optimal alignment path itself. This naturally occurs, for instance, when learning to improve the accuracy of predicted alignments against ground truth alignments. We evaluate DecDTW on two such applications, namely the audio-to-score alignment task in music information retrieval and the visual place recognition task in robotics, demonstrating state-of-the-art results in both.

What problem does this paper attempt to address?

### Problems Addressed by the Paper The paper aims to address the issue of time alignment in time series data and achieve an end-to-end learning model through the Dynamic Time Warping (DTW) algorithm. Existing differentiable DTW methods either achieve this by differentiating through a fixed warping path or by differentiable relaxation of the minimum operation in the recursive steps. In contrast, this paper proposes a new method based on bilevel optimization and deep declarative networks, named DecDTW. #### Main Contributions 1. **New Formulation of DTW as a Continuous, Inequality-Constrained Optimization Problem**: By redefining the DTW problem as a continuous optimization problem constrained by inequalities, implicit differentiation can be used to compute the gradient of the optimal alignment path. 2. **Novel DecDTW Layer**: A new DTW layer is defined based on the above optimization problem, solving the continuous time warping path problem during the forward pass. 3. **Utilization of Alignment Path Information**: DecDTW outputs the optimal warping path between two time series rather than a soft approximation, which is particularly useful in applications requiring precise alignment. 4. **Experimental Validation**: Experiments were conducted on audio-score alignment tasks in music information retrieval and visual place recognition tasks in robotics, achieving state-of-the-art results. ### Summary of Experimental Results 1. **Audio-Score Alignment Experiment**: - After training with DecDTW, even for low-level features (such as Mel spectrograms), alignment accuracy significantly improved, surpassing even the highly optimized Constant-Q Transform (CQT) features. - Table 1 shows the alignment errors (TimeErr and TimeDev) of different methods, with DecDTW exhibiting the best performance across all feature types. 2. **Visual Place Recognition Experiment**: - Based on image sequences, the visual place recognition task was transformed into a time alignment problem, and the deep feature extractor was fine-tuned to improve alignment results. - Experimental results indicate that using DecDTW significantly enhances the alignment accuracy between query images and database images. Through these experiments, DecDTW demonstrated its superior performance in practical applications, especially in tasks requiring high-precision alignment.

Deep Declarative Dynamic Time Warping for End-to-End Learning of Alignment Paths

Multilevel Dynamic Time Warping: A Parameter-Light Method for Fast Time Series Classification

Learning Discriminative Prototypes with Dynamic Time Warping

DsDTW: Local Representation Learning With Deep soft-DTW for Dynamic Signature Verification

D3TW: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation

DTW-NN: A novel neural network for time series recognition using dynamic alignment between inputs and weights

Sparsification of the Alignment Path Search Space in Dynamic Time Warping

Approximating Dynamic Time Warping with a convolutional neural network on EEG data

Sequential Data Classification by Dynamic State Warping

metricDTW: local distance metric learning in Dynamic Time Warping

Affine and Regional Dynamic Time Warpng

GDTW: A NOVEL DIFFERENTIABLE DTW LOSS FOR TIME SERIES TASKS

Stacked Marginal Time Warping for Temporal Alignment

Evaluating DTW Measures via a Synthesis Framework for Time-Series Data

Soft Dynamic Time Warping for Multi-Pitch Estimation and Beyond

Multiscale Manifold Warping

shapeDTW: shape Dynamic Time Warping

Diffeomorphic Transformations for Time Series Analysis: An Efficient Approach to Nonlinear Warping

Dynamic Time Warping Based Adversarial Framework for Time-Series Domain

A General Optimization Framework for Dynamic Time Warping

Invariant subspace learning for time series data based on dynamic time warping distance