West: Word Encoded Sequence Transducers

Ehsan Variani,Ananda Theertha Suresh,Mitchel Weintraub
DOI: https://doi.org/10.1109/icassp.2019.8683694
2019-05-01
Abstract:Most of the parameters in large vocabulary models are used in embedding layer to map categorical features to vectors and in softmax layer for classification weights. This is a bottleneck in memory constraint on-device training applications like federated learning and on-device inference applications like automatic speech recognition (ASR). One way of compressing the embedding and softmax layers is to substitute larger units such as words with smaller sub-units such as characters. However, often the sub-unit models perform poorly compared to the larger unit models. We propose WEST, an algorithm for encoding categorical features and output classes with a sequence of random or domain dependent sub-units and demonstrate that this transduction can lead to significant compression without compromising performance.
What problem does this paper attempt to address?