Variational Connectionist Temporal Classification for Order-Preserving Sequence Modeling

Zheng Nan,Ting Dang,Vidhyasaharan Sethu,Beena Ahmed

2023-12-15

Abstract:Connectionist temporal classification (CTC) is commonly adopted for sequence modeling tasks like speech recognition, where it is necessary to preserve order between the input and target sequences. However, CTC is only applied to deterministic sequence models, where the latent space is discontinuous and sparse, which in turn makes them less capable of handling data variability when compared to variational models. In this paper, we integrate CTC with a variational model and derive loss functions that can be used to train more generalizable sequence models that preserve order. Specifically, we derive two versions of the novel variational CTC based on two reasonable assumptions, the first being that the variational latent variables at each time step are conditionally independent; and the second being that these latent variables are Markovian. We show that both loss functions allow direct optimization of the variational lower bound for the model log-likelihood, and present computationally tractable forms for implementing them.

Machine Learning

What problem does this paper attempt to address?

This paper attempts to address the problem of how to combine variational modeling and Connectionist Temporal Classification (CTC) methods in sequence modeling tasks to improve the model's generalization ability and maintain the order relationship between input and output sequences. Specifically: 1. **Order Preservation Problem**: In tasks such as speech recognition and handwriting recognition, the order relationship between the input sequence and the target sequence needs to be maintained, meaning that a certain semantic information in the input sequence should appear in the corresponding position in the target sequence. Although existing Attention-based Encoder-Decoders (AEDs) perform well in many tasks, they cannot guarantee the preservation of this order relationship. 2. **Sparse and Discontinuous Latent Space Problem**: Traditional CTC methods and RNN-Transducers (RNN-Ts) perform poorly when handling data variations because their latent space is sparse and discontinuous. When test data is mapped to unexplored areas in the latent space, these models are prone to errors. Although variational models have been successfully applied in various fields to address these issues, there is currently no method that combines variational models with CTC. To solve the above problems, the paper proposes a new variational CTC method. By introducing two assumptions (conditional independence and Markov dependency), two loss functions are derived, allowing the model to maintain the order relationship while improving robustness to data variations. These two loss functions enable direct optimization of the variational lower bound of the model's log-likelihood, thereby mitigating errors caused by the sparsity and discontinuity of the latent space.

Variational Connectionist Temporal Classification for Order-Preserving Sequence Modeling

Connectionist Temporal Classification with Maximum Entropy Regularization.

Variational Temporal Abstraction

Variational Bi-LSTMs

Temporal Classification Constraint for Improving Handwritten Mathematical Expression Recognition

LV-CTC: Non-autoregressive ASR with CTC and latent variable models

Temporal Difference Variational Auto-Encoder

Variational Continual Test-Time Adaptation

Temporal-Difference Variational Continual Learning

Delay-penalized CTC implemented based on Finite State Transducer

AdaMER-CTC: Connectionist Temporal Classification with Adaptive Maximum Entropy Regularization for Automatic Speech Recognition

Comparison of Decoding Strategies for CTC Acoustic Models

Gram-CTC: Automatic Unit Selection and Target Decomposition for Sequence Labelling

Condition-transforming Variational Autoencoder for Conversation Response Generation.

Variational Classification

Learning Conditional Generative Models for Temporal Point Processes

CR-CTC: Consistency regularization on CTC for improved speech recognition

Gradient-free variational learning with conditional mixture networks

Variational learning for switching state-space models

Mode Variational LSTM Robust to Unseen Modes of Variation: Application to Facial Expression Recognition

Conditional Flow Variational Autoencoders for Structured Sequence Prediction