Abstract:Joint source-channel decoding techniques capitalize on residual redundancy that typically remains following a source encoding operation. These methods, which include MAP and MMSE-based decoders, estimate the sequence of encoded source symbols based on statistical knowledge of both the channel and the encoded source. Generally, these techniques are based on a Markov model for the quantized source and, thus, on a hidden Markov model for the source-channel tandem. The number of states in the hidden Markov model, and thus the computational and storage complexities, grow exponentially with the order (K) of the Markov model, i.e., the complexity order is O(NK+1 T) with N the number of source quantization levels and T the length of the data sequence. Thus, to retain implementable complexity, low order models (K = 1,2) are typically used, at the expense of model accuracy. In this thesis, we propose methods to bridge the performance-complexity gap, i.e. to provide solutions that give better performance than a low order decoder while incurring only modest increases in complexity. We first propose using improved iterative scaling(IIS) algorithm to learn the maximum entropy a posteriori probability from a large group of pairwise constraints. This significantly reduces the training set size required in direct model learning via frequency counts to avoid the overfitting problem. However, the high computational complexity is still unavoidable with high order model. Therefore, we suggest using neighboring noisy symbols in conditioning to incur only modest computational complexity increase compared to a low order model. We further applied the idea of using IIS to approximate high order conditioning in conditional entropy constrained encoding context. We then propose a second approach. This second decoding approach consists of two stages: (1) low order JSC decoding, followed by (2) a linear FIR filtering of the JSC decoded signal. The linear filter is chosen to provide an optimal (least squares) estimate of the original source. This approach provides an approximate way to increase the effective order of the decoder, yet while retaining quite manageable complexity. The new approach is demonstrated to significantly improve upon standard MMSE-based JSC decoding performance; both for the case of nonpredictive source coding (e.g. vector quantization) as well as for predictive source coding (DPCM). In our third approach, we tried to model the source sequence as a hidden Markov model whereas a Markov model is commonly used. This approach provides a source model which achieves infinite memory while a Markov source model only has finite memory. This translates to lower source entropy and better decoding performance compared to a SAMMSE decoder with comparable memory complexity, although our computational complexity is higher. As an extension, we also tried to model the source data as Markov random field with our sequence based mean field annealing method as the joint source channel decoder. This method relaxes the independence between pixels assumptions in the standard mean field annealing by assuming the independence is only between the rows of a image. By introducing more dependency between symbols into our source model, we further utilize this dependency to combat the noisy channel.

Length bias in Encoder Decoder Models and a Case for Global Conditioning

Learning to Decode for Future Success

How Well Can a Long Sequence Model Model Long Sequences? Comparing Architechtural Inductive Biases on Long-Context Abilities

Calibrating Sequence likelihood Improves Conditional Language Generation

Rethinking the adaptive relationship between Encoder Layers and Decoder Layers

The Implicit Length Bias of Label Smoothing on Beam Search Decoding

High Order Joint Source Channel Decoding and Conditional Entropy Encoding: Novel Bridging Techniques Between Performance and Complexity

Controlling Output Length in Neural Encoder-Decoders

On the Encoder-Decoder Incompatibility in Variational Text Modeling and Beyond

Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder

Improve Long-term Memory Learning Through Rescaling the Error Temporally

Length Generalization of Causal Transformers without Position Encoding

Closed-Book Training to Improve Summarization Encoder Memory

MEP: Multiple Kernel Learning Enhancing Relative Positional Encoding Length Extrapolation

Exponentially Increasing the Capacity-to-Computation Ratio for Conditional Computation in Deep Learning

Multiscale sequence modeling with a learned dictionary

An Empirical Investigation of Global and Local Normalization for Recurrent Neural Sequence Models Using a Continuous Relaxation to Beam Search

An Optimization Scheme for Segmented-Memory Neural Network

Following Length Constraints in Instructions

Understanding and Mitigating Tokenization Bias in Language Models