Deep Declarative Dynamic Time Warping for End-to-End Learning of Alignment Paths

Ming Xu,Sourav Garg,Michael Milford,Stephen Gould
2023-03-20
Abstract:This paper addresses learning end-to-end models for time series data that include a temporal alignment step via dynamic time warping (DTW). Existing approaches to differentiable DTW either differentiate through a fixed warping path or apply a differentiable relaxation to the min operator found in the recursive steps used to solve the DTW problem. We instead propose a DTW layer based around bi-level optimisation and deep declarative networks, which we name DecDTW. By formulating DTW as a continuous, inequality constrained optimisation problem, we can compute gradients for the solution of the optimal alignment (with respect to the underlying time series) using implicit differentiation. An interesting byproduct of this formulation is that DecDTW outputs the optimal warping path between two time series as opposed to a soft approximation, recoverable from Soft-DTW. We show that this property is particularly useful for applications where downstream loss functions are defined on the optimal alignment path itself. This naturally occurs, for instance, when learning to improve the accuracy of predicted alignments against ground truth alignments. We evaluate DecDTW on two such applications, namely the audio-to-score alignment task in music information retrieval and the visual place recognition task in robotics, demonstrating state-of-the-art results in both.
Machine Learning,Computer Vision and Pattern Recognition,Robotics
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper aims to address the issue of time alignment in time series data and achieve an end-to-end learning model through the Dynamic Time Warping (DTW) algorithm. Existing differentiable DTW methods either achieve this by differentiating through a fixed warping path or by differentiable relaxation of the minimum operation in the recursive steps. In contrast, this paper proposes a new method based on bilevel optimization and deep declarative networks, named DecDTW. #### Main Contributions 1. **New Formulation of DTW as a Continuous, Inequality-Constrained Optimization Problem**: By redefining the DTW problem as a continuous optimization problem constrained by inequalities, implicit differentiation can be used to compute the gradient of the optimal alignment path. 2. **Novel DecDTW Layer**: A new DTW layer is defined based on the above optimization problem, solving the continuous time warping path problem during the forward pass. 3. **Utilization of Alignment Path Information**: DecDTW outputs the optimal warping path between two time series rather than a soft approximation, which is particularly useful in applications requiring precise alignment. 4. **Experimental Validation**: Experiments were conducted on audio-score alignment tasks in music information retrieval and visual place recognition tasks in robotics, achieving state-of-the-art results. ### Summary of Experimental Results 1. **Audio-Score Alignment Experiment**: - After training with DecDTW, even for low-level features (such as Mel spectrograms), alignment accuracy significantly improved, surpassing even the highly optimized Constant-Q Transform (CQT) features. - Table 1 shows the alignment errors (TimeErr and TimeDev) of different methods, with DecDTW exhibiting the best performance across all feature types. 2. **Visual Place Recognition Experiment**: - Based on image sequences, the visual place recognition task was transformed into a time alignment problem, and the deep feature extractor was fine-tuned to improve alignment results. - Experimental results indicate that using DecDTW significantly enhances the alignment accuracy between query images and database images. Through these experiments, DecDTW demonstrated its superior performance in practical applications, especially in tasks requiring high-precision alignment.