Compressing the Embedding Matrix by a Dictionary Screening Approach in Text Classification.

Jing Zhou,Xinru Jing,Muyu Liu,Hansheng Wang
DOI: https://doi.org/10.1007/978-3-031-33374-3_36
2023-01-01
Abstract:In this paper, we propose a dictionary screening method for embedding compression in text classification. The key point is to evaluate the importance of each keyword in the dictionary. To this end, we first train a pre-specified recurrent neural network-based model using a full dictionary. This leads to a benchmark model, which we use to obtain the predicted class probabilities for each sample in a dataset. Next, to evaluate the impact of each keyword in affecting the predicted class probabilities, we develop a novel method for assessing the importance of each keyword in a dictionary. Consequently, each keyword can be screened, and only the most important keywords are reserved. With these screened keywords, a new dictionary with a considerably reduced size can be constructed. Accordingly, the original text sequence can be substantially compressed. The proposed method leads to significant reductions in terms of parameters, average text sequence, and dictionary size. Meanwhile, the prediction power remains very competitive compared to the benchmark model. Extensive numerical studies are presented to demonstrate the empirical performance of the proposed method.
What problem does this paper attempt to address?