RouterDC: Query-Based Router by Dual Contrastive Learning for Assembling Large Language Models

Shuhao Chen,Weisen Jiang,Baijiong Lin,James T. Kwok,Yu Zhang
2024-09-30
Abstract:Recent works show that assembling multiple off-the-shelf large language models (LLMs) can harness their complementary abilities. To achieve this, routing is a promising method, which learns a router to select the most suitable LLM for each query. However, existing routing models are ineffective when multiple LLMs perform well for a query. To address this problem, in this paper, we propose a method called query-based Router by Dual Contrastive learning (RouterDC). The RouterDC model consists of an encoder and LLM embeddings, and we propose two contrastive learning losses to train the RouterDC model. Experimental results show that RouterDC is effective in assembling LLMs and largely outperforms individual top-performing LLMs as well as existing routing methods on both in-distribution (+2.76\%) and out-of-distribution (+1.90\%) tasks. Source code is available at <a class="link-external link-https" href="https://github.com/shuhao02/RouterDC" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively assemble multiple off - the - shelf large - scale language models (LLMs) to utilize their complementary capabilities. Specifically, existing methods are insufficient in selecting the best LLM for each query, especially when multiple LLMs perform well on the same query, and the existing routing models cannot effectively distinguish these LLMs. To solve this problem, the paper proposes a query - routing method based on dual - contrast learning (RouterDC), aiming to improve the selection accuracy and efficiency of the routing model. ### Problem Background Large - scale language models (LLMs) perform well in various tasks, but different models have their own advantages and disadvantages in different tasks. Therefore, assembling multiple LLMs together can better utilize their complementary capabilities. However, existing routing methods are not effective when dealing with multiple LLMs that perform well on the same query. For example, the ZOOTER method uses the scores of the reward model as a supervision signal, but in cases where multiple LLMs perform similarly, this method will lead to a small - probability distribution generated by the router, thus affecting the selection accuracy. ### Solution The RouterDC model proposed in the paper improves the learning process of the routing model by introducing dual - contrast learning. Specifically: 1. **Sample - LLM Contrastive Loss**: For each query, it is divided into positive samples (LLMs with good performance) and negative samples (LLMs with poor performance) according to the performance of the LLMs. Through the contrastive loss function, the query embedding vector is made closer to the embedding vector of the positive - sample LLMs and farther from the embedding vector of the negative - sample LLMs. \[ L_{\text{sample - LLM}}(x_i, y_i; \theta)=-\sum_{t^+\in I^+_i}\log\frac{\exp(\text{sim}(E(x_i; w), k_{t^+}))}{\exp(\text{sim}(E(x_i; w), k_{t^+}))+\sum_{t^-\in I^-_i}\exp(\text{sim}(E(x_i; w), k_{t^-}))} \] 2. **Sample - Sample Contrastive Loss**: To improve the training stability, the paper also introduces the sample - sample contrastive loss. Through clustering, the training queries are divided into multiple groups, and queries within the same group are encouraged to have similar embedding vectors, while the query embedding vectors between different groups are quite different. \[ L_{\text{sample - sample}}(x_i; \theta)=-\log\frac{\exp(\text{sim}(E(x_i; w), E(x^+_i; w)))}{\exp(\text{sim}(E(x_i; w), E(x^+_i; w)))+\sum_{x^-_i\in X^-_i}\exp(\text{sim}(E(x_i; w), E(x^-_i; w))} \] ### Experimental Results The experimental results show that RouterDC significantly outperforms existing routing methods and a single top - level LLM in both in - distribution tasks and out - of - distribution tasks. Specifically: - In in - distribution tasks, the average accuracy of RouterDC is improved by 2.76%. - In out - of - distribution tasks, the average accuracy of RouterDC is improved by 1.90%. - The inference speed of RouterDC is about 6 times faster than that of the voting method. In summary, by introducing the dual - contrast learning method, the paper successfully solves the selection problem of existing routing models when multiple LLMs perform similarly, and achieves significant improvements in both performance and efficiency.