Deep Equilibrium Non-Autoregressive Sequence Learning

Zaixiang Zheng,Yi Zhou,Hao Zhou
DOI: https://doi.org/10.18653/v1/2023.findings-acl.747
2023-01-01
Abstract:In this work, we argue that non-autoregressive (NAR) sequence generative models can equivalently be regarded as iterative refinement process towards the target sequence, implying an underlying dynamical system of NAR models: z = f (z, x) → y.In such a way, the optimal prediction of a NAR model should be the equilibrium state of its dynamics if given infinitely many iterations.However, this is infeasible in practice due to limited computational and memory budgets.To this end, we propose DEQNAR to directly solve for the equilibrium state of NAR models based on deep equilibrium networks (Bai et al., 2019) with black-box rootfinding solvers and back-propagate through the equilibrium point via implicit differentiation with constant memory.We conduct extensive experiments on four WMT machine translation benchmarks.Our main findings show that DEQNAR can indeed converge to a more accurate prediction and is a general-purpose framework that consistently helps yield substantial improvement for several strong NAR backbones.
What problem does this paper attempt to address?