Abstract:Fine-tuning large pre-trained language models (LLMs) on particular datasets is a commonly employed strategy in Natural Language Processing (NLP) classification tasks. However, this approach usually results in a loss of models generalizability. In this paper, we present a framework that allows for maintaining generalizability, and enhances the performance on the downstream task by utilizing task-specific context attribution. We show that a linear transformation of the text representation from any transformer model using the task-specific concept operator results in a projection onto the latent concept space, referred to as context attribution in this paper. The specific concept operator is optimized during the supervised learning stage via novel loss functions. The proposed framework demonstrates that context attribution of the text representation for each task objective can improve the capacity of the discriminator function and thus achieve better performance for the classification task. Experimental results on three datasets, namely HateXplain, IMDB reviews, and Social Media Attributions, illustrate that the proposed model attains superior accuracy and generalizability. Specifically, for the non-fine-tuned BERT on the HateXplain dataset, we observe 8% improvement in accuracy and 10% improvement in F1-score. Whereas for the IMDB dataset, fine-tuned state-of-the-art XLNet is outperformed by 1% for both accuracy and F1-score. Furthermore, in an out-of-domain cross-dataset test, DistilBERT fine-tuned on the IMDB dataset in conjunction with the proposed model improves the F1-score on the HateXplain dataset by 7%. For the Social Media Attributions dataset of YouTube comments, we observe 5.2% increase in F1-metric. The proposed framework is implemented with PyTorch and provided open-source on GitHub.

Intepreting & Improving Pretrained Language Models: A Probabilistic Conceptual Approach

Variational Language Concepts for Interpreting Foundation Language Models

Explaining Language Models' Predictions with High-Impact Concepts

Improving Language Models Meaning Understanding and Consistency by Learning Conceptual Roles from Dictionary

Crafting Large Language Models for Enhanced Interpretability

A Study of Pre-trained Language Models in Natural Language Processing

Paying More Attention to Self-attention: Improving Pre-trained Language Models via Attention Guiding

Recent Advances in Pre-trained Language Models: Why Do They Work and How Do They Work

Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey

Can LLMs facilitate interpretation of pre-trained language models?

Unveiling the Black Box of PLMs with Semantic Anchors: Towards Interpretable Neural Semantic Parsing

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Concept Bottleneck Large Language Models

Interpreting Deep Learning Models in Natural Language Processing: A Review

How Does Pretraining Improve Discourse-Aware Translation?

Hierarchical Interpretation of Neural Text Classification

Breaking Free Transformer Models: Task-specific Context Attribution Promises Improved Generalizability Without Fine-tuning Pre-trained LLMs

Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?

Concept Bottleneck Language Models For protein design

Breaking Language Barriers: Cross-Lingual Continual Pre-Training at Scale

Large Language Models are Interpretable Learners