Scaling Laws for Discriminative Classification in Large Language Models

Dean Wyatte,Fatemeh Tahmasbi,Ming Li,Thomas Markovich
2024-05-25
Abstract:Modern large language models (LLMs) represent a paradigm shift in what can plausibly be expected of machine learning models. The fact that LLMs can effectively generate sensible answers to a diverse range of queries suggests that they would be useful in customer support applications. While powerful, LLMs have been observed to be prone to hallucination which unfortunately makes their near term use in customer support applications challenging. To address this issue we present a system that allows us to use an LLM to augment our customer support advocates by re-framing the language modeling task as a discriminative classification task. In this framing, we seek to present the top-K best template responses for a customer support advocate to use when responding to a customer. We present the result of both offline and online experiments where we observed offline gains and statistically significant online lifts for our experimental system. Along the way, we present observed scaling curves for validation loss and top-K accuracy, resulted from model parameter ablation studies. We close by discussing the space of trade-offs with respect to model size, latency, and accuracy as well as and suggesting future applications to explore.
Computation and Language,Machine Learning
What problem does this paper attempt to address?
This paper mainly discusses the limitations of large-scale language models (LLMs) in customer service applications, especially their tendency to generate illusory answers and leak data. To address these issues, the paper proposes transforming language modeling tasks into discriminative classification tasks to generate the most appropriate reply templates to assist customer representatives. Through offline and online experiments, the study found that this approach can improve the system's performance and observed the scale law of validation loss and Top-K accuracy. The paper also discusses the trade-offs between model size, latency, and accuracy, as well as potential future directions. Additionally, they demonstrate the first implementation of converting LLMs into discriminative classifiers in an industrial environment and investigate the scale law of closed-domain adaptation.