Leveraging Multiple Teachers for Test-Time Adaptation of Language-Guided Classifiers

Kangda Wei,Sayan Ghosh,Rakesh R. Menon,Shashank Srivastava
DOI: https://doi.org/10.48550/arXiv.2311.07538
2023-11-14
Abstract:Recent approaches have explored language-guided classifiers capable of classifying examples from novel tasks when provided with task-specific natural language explanations, instructions or prompts (Sanh et al., 2022; R. Menon et al., 2022). While these classifiers can generalize in zero-shot settings, their task performance often varies substantially between different language explanations in unpredictable ways (Lu et al., 2022; Gonen et al., 2022). Also, current approaches fail to leverage unlabeled examples that may be available in many scenarios. Here, we introduce TALC, a framework that uses data programming to adapt a language-guided classifier for a new task during inference when provided with explanations from multiple teachers and unlabeled test examples. Our results show that TALC consistently outperforms a competitive baseline from prior work by an impressive 9.3% (relative improvement). Further, we demonstrate the robustness of TALC to variations in the quality and quantity of provided explanations, highlighting its potential in scenarios where learning from multiple teachers or a crowd is involved. Our code is available at: <a class="link-external link-https" href="https://github.com/WeiKangda/TALC.git" rel="external noopener nofollow">this https URL</a>.
Computation and Language,Machine Learning
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are several limitations of existing language - guided classifiers when dealing with new tasks: 1. **Lack of weight strategies for multi - source language supervision**: Existing language - guided classifiers cannot effectively weigh language guidance from multiple sources (or teachers). When multiple teachers provide different natural - language explanations, these classifiers do not have a systematic method to determine which explanations are more reliable or useful. 2. **Failure to utilize unlabeled data**: In the inference stage, existing methods fail to fully utilize the potentially existing unlabeled data. This limits the model's adaptability to new tasks, especially in zero - shot or few - shot situations. 3. **Impact of explanation quality**: The impact of existing methods on explanations of different qualities and the introduction of low - quality explanations has not been fully explored. High - quality explanations can significantly improve model performance, while low - quality explanations may introduce noise and degrade the model's performance. To solve these problems, the authors propose TALC (Test - time Adaptation of Language - guided Classifiers), a framework based on data programming for adapting to new tasks in the inference stage. TALC improves existing methods in the following ways: - **Aggregation of multi - teacher explanations**: TALC can integrate natural - language explanations from multiple teachers and determine the final prediction label through label aggregation. This enables the model to better utilize explanation information from multiple sources. - **Utilization of unlabeled data**: TALC uses unlabeled data for pseudo - label generation and aggregation, thereby enhancing the model's generalization ability in the inference stage. - **Robustness to explanation quality**: TALC demonstrates robustness to different quantities and qualities of explanations and can maintain good performance in the face of changes in explanation quality and quantity. Specifically, TALC achieves its goals through the following steps: 1. **Input and pre - processing**: Given a set of task - specific natural - language explanations \( E=\{e_1, e_2,\ldots,e_m\} \) for a new task and examples \( \{X_i\in X_{\text{test}}\} \) in the test set, use the base language - guided classifier MLC to make predictions for each explanation - example pair and generate a label matrix \( M \). 2. **Label aggregation**: Train a label aggregator \( L_{\text{agg}} \) through data - programming techniques to model the dependency between the implicit label and the label matrix \( M \). The label aggregator defines the joint probability distribution: \[ P(X, E, Y; \text{MLC})\propto\exp(w^T\phi(X, E, Y, \text{MLC})) \] where \( \phi \) is a feature vector and \( w \) is a weight vector. The aggregator learns the weight \( w \) by maximizing the log - likelihood function and uses Gibbs sampling for the final prediction. 3. **Adaptation and evaluation**: By adjusting the proportion \( \alpha \) of the adaptation set, TALC can flexibly utilize unlabeled data of different sizes for adaptation. Experimental results show that TALC significantly outperforms existing baseline methods on multiple real - world classification tasks, with an average improvement in accuracy of about 3.3%. In conclusion, TALC provides an effective method to overcome the limitations of existing language - guided classifiers on new tasks, especially performing well in multi - teacher explanations and unlabeled data utilization.