Abstract:While the success of large language models (LLMs) increases demand for machine-generated text, current pay-per-token pricing schemes create a misalignment of incentives known in economics as moral hazard: Text-generating agents have strong incentive to cut costs by preferring a cheaper model over the cutting-edge one, and this can be done "behind the scenes" since the agent performs inference internally. In this work, we approach this issue from an economic perspective, by proposing a pay-for-performance, contract-based framework for incentivizing quality. We study a principal-agent game where the agent generates text using costly inference, and the contract determines the principal's payment for the text according to an automated quality evaluation. Since standard contract theory is inapplicable when internal inference costs are unknown, we introduce cost-robust contracts. As our main theoretical contribution, we characterize optimal cost-robust contracts through a direct correspondence to optimal composite hypothesis tests from statistics, generalizing a result of Saig et al. (NeurIPS'23). We evaluate our framework empirically by deriving contracts for a range of objectives and LLM evaluation benchmarks, and find that cost-robust contracts sacrifice only a marginal increase in objective value compared to their cost-aware counterparts.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the misaligned economic incentives caused by the existing pay - per - token pricing schemes in text generation by large language models (LLMs). Specifically: 1. **Moral Hazard under the Pay - per - Token Scheme**: The current pay - per - token scheme gives text - generation agents a strong incentive to reduce costs by choosing cheaper but lower - quality models, and this behavior is not easily detectable by consumers because the inference process is internal. This is known as "moral hazard" in economics. 2. **Difficulties in Evaluation in Complex Tasks**: For complex text - generation tasks (such as medical record summarization), the use of high - quality models is crucial. However, in the case of pay - per - token, companies may use lower - cost, lower - quality models to increase profits, which poses potential risks to consumers. To solve these problems, the author proposes a contract - based pay - for - performance framework from an economic perspective, aiming to stimulate the generation of high - quality text by designing optimal pay - for - performance contracts. Specific measures include: - **Introducing Cost - Robustness Contracts**: Given that the internal inference costs of agents are unknown, the author proposes the concept of cost - robustness contracts to ensure effective incentives for high - quality text generation even when the cost structure is uncertain. - **Theoretical Contributions**: The author proves the direct correspondence between the optimal cost - robustness contract and statistical hypothesis testing. This theoretical result generalizes previous research findings and provides a theoretical basis for designing efficient incentive mechanisms. - **Empirical Evaluation**: Through a series of experiments, the author verifies the performance of cost - robustness contracts on different tasks and evaluation benchmarks, indicating that these contracts can ensure performance while only incurring marginal cost increases. In summary, the core objective of this paper is to design pay - for - performance contracts that can effectively stimulate the generation of high - quality text through economic and statistical methods, thereby solving the incentive misalignment problem in existing pricing schemes.

Incentivizing Quality Text Generation via Statistical Contracts

MotiLearn: Contract-Based Incentive Mechanism for Heterogeneous Edge Collaborative Training

Eliciting Informative Text Evaluations with Large Language Models

Delegated Classification

Contractual Reinforcement Learning: Pulling Arms with Invisible Hands

Incentivizing Evaluation via Limited Access to Ground Truth: Peer-Prediction Makes Things Worse

Estimating Effects of Incentive Contracts in Online Labor Platforms

Towards Mitigating Perceived Unfairness in Contracts from a Non-Legal Stakeholder's Perspective

Algorithmic Contract Design with Reinforcement Learning Agents

Incentive contract design considering quotas production: A principal-agent perspective

Decoding Game: On Minimax Optimality of Heuristic Text Generation Strategies

Formal Contracts Mitigate Social Dilemmas in Multi-Agent RL

Incentivizing Honesty among Competitors in Collaborative Learning and Optimization

Mechanism Design for Large Language Models

Formal contracts mitigate social dilemmas in multi-agent reinforcement learning

Towards Better Open-Ended Text Generation: A Multicriteria Evaluation Framework

Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiation

Incentive Compatibility in Stochastic Dynamic Systems

Contracting With a Reinforcement Learning Agent by Playing Trick or Treat

Contracts with Private Cost per Unit-of-Effort

Incentive Design with Spillovers