Abstract:Natural language understanding (NLU) using neural network pipelines often requires additional context that is not solely present in the input data. Through Prior research, it has been evident that NLU benchmarks are susceptible to manipulation by neural models, wherein these models exploit statistical artifacts within the encoded external knowledge to artificially inflate performance metrics for downstream tasks. Our proposed approach, known as the Recap, Deliberate, and Respond (RDR) paradigm, addresses this issue by incorporating three distinct objectives within the neural network pipeline. Firstly, the Recap objective involves paraphrasing the input text using a paraphrasing model in order to summarize and encapsulate its essence. Secondly, the Deliberation objective entails encoding external graph information related to entities mentioned in the input text, utilizing a graph embedding model. Finally, the Respond objective employs a classification head model that utilizes representations from the Recap and Deliberation modules to generate the final prediction. By cascading these three models and minimizing a combined loss, we mitigate the potential for gaming the benchmark and establish a robust method for capturing the underlying semantic patterns, thus enabling accurate predictions. To evaluate the effectiveness of the RDR method, we conduct tests on multiple GLUE benchmark tasks. Our results demonstrate improved performance compared to competitive baselines, with an enhancement of up to 2\% on standard metrics. Furthermore, we analyze the observed evidence for semantic understanding exhibited by RDR models, emphasizing their ability to avoid gaming the benchmark and instead accurately capture the true underlying semantic patterns.

Bag of Tricks for Effective Language Model Pretraining and Downstream Adaptation: A Case Study on GLUE

Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE

GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

VindLU: A Recipe for Effective Video-and-Language Pretraining

SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

GLGE: A New General Language Generation Evaluation Benchmark

Pre-Training a Language Model Without Human Language

Enhancing CLIP with GPT-4: Harnessing Visual Descriptions as Prompts

Improving the Language Understanding Capabilities of Large Language Models Using Reinforcement Learning

GLUE-X: Evaluating Natural Language Understanding Models from an Out-of-Distribution Generalization Perspective.

Exploring the Impact of a Transformer's Latent Space Geometry on Downstream Task Performance

VideoGLUE: Video General Understanding Evaluation of Foundation Models

VILA$^2$: VILA Augmented VILA

Empirical Analysis of Efficient Fine-Tuning Methods for Large Pre-Trained Language Models

GRIT-VLP: Grouped Mini-batch Sampling for Efficient Vision and Language Pre-training

Rethinking Overlooked Aspects in Vision-Language Models

IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models

TAVAT: Token-Aware Virtual Adversarial Training for Language Understanding

RDR: the Recap, Deliberate, and Respond Method for Enhanced Language Understanding