Contrastive Learning and Mixture of Experts Enables Precise Vector Embeddings

Logan Hallee,Rohan Kapur,Arjun Patel,Jason P. Gleghorn,Bohdan Khomtchouk
2024-05-31
Abstract:The advancement of transformer neural networks has significantly elevated the capabilities of sentence similarity models, but they struggle with highly discriminative tasks and produce sub-optimal representations of important documents like scientific literature. With the increased reliance on retrieval augmentation and search, representing diverse documents as concise and descriptive vectors is crucial. This paper improves upon the vectors embeddings of scientific literature by assembling niche datasets using co-citations as a similarity metric, focusing on biomedical domains. We apply a novel Mixture of Experts (MoE) extension pipeline to pretrained BERT models, where every multi-layer perceptron section is enlarged and copied into multiple distinct experts. Our MoE variants perform well over $N$ scientific domains with $N$ dedicated experts, whereas standard BERT models excel in only one domain. Notably, extending just a single transformer block to MoE captures 85% of the benefit seen from full MoE extension at every layer. This holds promise for versatile and efficient One-Size-Fits-All transformer networks for numerically representing diverse inputs. Our methodology marks significant advancements in representing scientific text and holds promise for enhancing vector database search and compilation.
Machine Learning,Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
The main problem this paper attempts to address is the inadequacy of existing large language models (LLMs) in generating reliable vector embeddings and performing precise classification, especially in information retrieval and web search technologies. Despite the significant success and widespread adoption of transformer-based large language models since 2017, they still face challenges in handling highly discriminative tasks, particularly for important documents requiring high-precision representation, such as scientific literature. Specifically, the paper points out that current sentence similarity models, although having made breakthroughs in fields like sentiment analysis, perform poorly when dealing with subtle differences in specific domains, leading to suboptimal representation of many important documents. Therefore, the paper proposes a new approach that combines contrastive learning and Mixture of Experts (MoE) to extend the pre-trained BERT model to improve vector embeddings of scientific literature. This approach aims to enhance the model's performance through the following two aspects: 1. **Domain-specific fine-tuning**: Utilizing co-citation as a similarity measure, applying contrastive fine-tuning methods to the pre-trained BERT model to enable it to learn and understand specific scientific domains. 2. **Achieving general applicability through Mixture of Experts**: Introducing a scalable method to apply the MoE model to pre-trained BERT models across multiple domains, aiming to create a versatile "one-size-fits-all" model. The methodology of the paper marks a significant advancement in representing scientific texts, promising to enhance the search and compilation capabilities of vector databases. Experimental results show that the proposed model significantly outperforms general pre-trained models, fine-tuned sentence similarity models, and science-oriented BERT models in multiple biomedical fields. Specifically, the proposed MoE variant achieves performance comparable to multiple independent models across various domains, suggesting that a "one-size-fits-all" transformer network might be feasible for certain tasks. These models have profound implications for applications relying on precise text classification and vector embeddings, such as information retrieval and web search.