Abstract:Fine-tuning large pre-trained language models (LLMs) on particular datasets is a commonly employed strategy in Natural Language Processing (NLP) classification tasks. However, this approach usually results in a loss of models generalizability. In this paper, we present a framework that allows for maintaining generalizability, and enhances the performance on the downstream task by utilizing task-specific context attribution. We show that a linear transformation of the text representation from any transformer model using the task-specific concept operator results in a projection onto the latent concept space, referred to as context attribution in this paper. The specific concept operator is optimized during the supervised learning stage via novel loss functions. The proposed framework demonstrates that context attribution of the text representation for each task objective can improve the capacity of the discriminator function and thus achieve better performance for the classification task. Experimental results on three datasets, namely HateXplain, IMDB reviews, and Social Media Attributions, illustrate that the proposed model attains superior accuracy and generalizability. Specifically, for the non-fine-tuned BERT on the HateXplain dataset, we observe 8% improvement in accuracy and 10% improvement in F1-score. Whereas for the IMDB dataset, fine-tuned state-of-the-art XLNet is outperformed by 1% for both accuracy and F1-score. Furthermore, in an out-of-domain cross-dataset test, DistilBERT fine-tuned on the IMDB dataset in conjunction with the proposed model improves the F1-score on the HateXplain dataset by 7%. For the Social Media Attributions dataset of YouTube comments, we observe 5.2% increase in F1-metric. The proposed framework is implemented with PyTorch and provided open-source on GitHub.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in natural language processing (NLP) classification tasks, when fine - tuning large pre - trained language models (LLMs) with specific datasets, it usually leads to the decline of the model's generalization ability. Specifically, the author proposes a new framework, namely **Space Model**, aiming to maintain the generalization ability of the pre - trained model without fine - tuning and improve the performance of downstream tasks by using task - specific context attribution. ### Main contributions of the paper 1. **Proposed a new model fine - tuning framework**: The Space Model realizes context attribution by projecting text representations into the latent concept space through the introduction of task - specific concept operators. These concept operators are optimized through a new loss function in the supervised learning stage. 2. **Improved the generalization ability and performance of the model**: Experimental results show that the Space Model not only significantly improves accuracy but also enhances the model's generalization ability on multiple datasets (such as HateXplain, IMDB reviews, and social media attribution datasets). 3. **Introduced a new loss function**: To ensure that concepts are independent in context attribution, the author introduced Intra - Space loss, which helps improve the stability and zero - shot ability of the model. ### Specific improvements - **Performance of non - fine - tuned BERT on the HateXplain dataset**: The accuracy rate is increased by 8%, and the F1 score is increased by 10%. - **Performance on the IMDB dataset**: When using XLNet as the base model, after full model adaptation, the Space Model improves both the accuracy rate and the F1 score by 1%. - **Cross - dataset testing**: The F1 score of DistilBERT fine - tuned on the IMDB dataset combined with the Space Model is increased by 7% on the HateXplain dataset. - **Performance on the social media attribution dataset**: Compared with the manually supervised method, the Space Model improves the F1 score by 5.2% without additional supervision. ### Method overview - **Context attribution**: Project the context embedding into the concept space through the concept operator to form context attribution. These projections are used to calculate the similarity between the original sentence and the concept projection, thereby guiding classification. - **Concept projection**: Each category corresponds to a concept space operator, which projects the context embedding into the concept space to generate new concept embeddings. The dimensions of these embeddings are defined by the latent space. - **Loss function**: Mainly optimize the cross - entropy loss and introduce Intra - Space loss to ensure that the concept embeddings are independent in the concept space. ### Experimental setup - **Datasets**: HateXplain, IMDB reviews, and social media attribution datasets. - **Base models**: BERT, DistilBERT, and XLNet. - **Evaluation metrics**: Accuracy, F1 - macro average score, recall rate. - **Training configuration**: Use the Adam optimizer, with a learning rate of 2e - 4, a maximum sequence length of 256, and a batch size of 8 (for the XLNet large model, the batch size is 4). ### Experimental results - **IMDB dataset**: The Space Model outperforms the baseline model in both accuracy and F1 - macro average score. - **HateXplain dataset**: The Space Model performs well in zero - shot testing, especially in the F1 - macro average score. - **Cross - dataset testing**: The performance of the model fine - tuned on the IMDB dataset is also improved on the HateXplain dataset. In conclusion, this paper successfully solves the problem of the decline in generalization ability when fine - tuning pre - trained language models by introducing the Space Model framework, and verifies its effectiveness and superiority on multiple datasets.

Breaking Free Transformer Models: Task-specific Context Attribution Promises Improved Generalizability Without Fine-tuning Pre-trained LLMs

Provably Transformers Harness Multi-Concept Word Semantics for Efficient In-Context Learning

Better Explain Transformers by Illuminating Important Information

AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers

DETAIL: Task DEmonsTration Attribution for Interpretable In-context Learning

Cross-Domain Sentiment Classification With Bidirectional Contextualized Transformer Language Models

ProTransformer: Robustify Transformers via Plug-and-Play Paradigm

Context-Sensitive Visualization of Deep Learning Natural Language Processing Models

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context.

Beyond Intuition: Rethinking Token Attributions Inside Transformers

Does learning the right latent variables necessarily improve in-context learning?

Context-Scaling versus Task-Scaling in In-Context Learning

Segatron: Segment-Aware Transformer for Language Modeling and Understanding

Focused Transformer: Contrastive Training for Context Scaling

Comparative Study of Language Models on Cross-Domain Data with Model Agnostic Explainability

Intepreting & Improving Pretrained Language Models: A Probabilistic Conceptual Approach

Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data

TRANS-BLSTM: Transformer with Bidirectional LSTM for Language Understanding

Transformer Layer Injection: A Novel Approach for Efficient Upscaling of Large Language Models

HyPe: Better Pre-trained Language Model Fine-tuning with Hidden Representation Perturbation

Lite Transformer with Long-Short Range Attention