Variational Connectionist Temporal Classification for Order-Preserving Sequence Modeling

Zheng Nan,Ting Dang,Vidhyasaharan Sethu,Beena Ahmed
2023-12-15
Abstract:Connectionist temporal classification (CTC) is commonly adopted for sequence modeling tasks like speech recognition, where it is necessary to preserve order between the input and target sequences. However, CTC is only applied to deterministic sequence models, where the latent space is discontinuous and sparse, which in turn makes them less capable of handling data variability when compared to variational models. In this paper, we integrate CTC with a variational model and derive loss functions that can be used to train more generalizable sequence models that preserve order. Specifically, we derive two versions of the novel variational CTC based on two reasonable assumptions, the first being that the variational latent variables at each time step are conditionally independent; and the second being that these latent variables are Markovian. We show that both loss functions allow direct optimization of the variational lower bound for the model log-likelihood, and present computationally tractable forms for implementing them.
Machine Learning
What problem does this paper attempt to address?
This paper attempts to address the problem of how to combine variational modeling and Connectionist Temporal Classification (CTC) methods in sequence modeling tasks to improve the model's generalization ability and maintain the order relationship between input and output sequences. Specifically: 1. **Order Preservation Problem**: In tasks such as speech recognition and handwriting recognition, the order relationship between the input sequence and the target sequence needs to be maintained, meaning that a certain semantic information in the input sequence should appear in the corresponding position in the target sequence. Although existing Attention-based Encoder-Decoders (AEDs) perform well in many tasks, they cannot guarantee the preservation of this order relationship. 2. **Sparse and Discontinuous Latent Space Problem**: Traditional CTC methods and RNN-Transducers (RNN-Ts) perform poorly when handling data variations because their latent space is sparse and discontinuous. When test data is mapped to unexplored areas in the latent space, these models are prone to errors. Although variational models have been successfully applied in various fields to address these issues, there is currently no method that combines variational models with CTC. To solve the above problems, the paper proposes a new variational CTC method. By introducing two assumptions (conditional independence and Markov dependency), two loss functions are derived, allowing the model to maintain the order relationship while improving robustness to data variations. These two loss functions enable direct optimization of the variational lower bound of the model's log-likelihood, thereby mitigating errors caused by the sparsity and discontinuity of the latent space.