Abstract:First-price auctions have very recently swept the online advertising industry, replacing second-price auctions as the predominant auction mechanism on many platforms. This shift has brought forth important challenges for a bidder: how should one bid in a first-price auction, where unlike in second-price auctions, it is no longer optimal to bid one's private value truthfully and hard to know the others' bidding behaviors? In this paper, we take an online learning angle and address the fundamental problem of learning to bid in repeated first-price auctions, where both the bidder's private valuations and other bidders' bids can be arbitrary. We develop the first minimax optimal online bidding algorithm that achieves an $\widetilde{O}(\sqrt{T})$ regret when competing with the set of all Lipschitz bidding policies, a strong oracle that contains a rich set of bidding strategies. This novel algorithm is built on the insight that the presence of a good expert can be leveraged to improve performance, as well as an original hierarchical expert-chaining structure, both of which could be of independent interest in online learning. Further, by exploiting the product structure that exists in the problem, we modify this algorithm--in its vanilla form statistically optimal but computationally infeasible--to a computationally efficient and space efficient algorithm that also retains the same $\widetilde{O}(\sqrt{T})$ minimax optimal regret guarantee. Additionally, through an impossibility result, we highlight that one is unlikely to compete this favorably with a stronger oracle (than the considered Lipschitz bidding policies). Finally, we test our algorithm on three real-world first-price auction datasets obtained from Verizon Media and demonstrate our algorithm's superior performance compared to several existing bidding algorithms.

Online Learning for Auction Mechanism in Bandit Setting

Auction-Based Combinatorial Multi-Armed Bandit Mechanisms with Strategic Arms

Multi-Armed Bandit Mechanisms for Multi-Slot Sponsored Search Auctions

Non-stationary Continuum-armed Bandits for Online Hyperparameter Optimization.

Online Ad Procurement in Non-stationary Autobidding Worlds

Bandit Learning to Rank with Position-Based Click Models: Personalized and Equal Treatments

Combination of Auction Theory and Multi-Armed Bandits: Model, Algorithm, and Application

An Optimal Bidimensional Multi-Armed Bandit Auction for Multi-unit Procurement

Online Learning for Measuring Incentive Compatibility in Ad Auctions

Multi-Armed Bandit with Budget Constraint and Variable Costs.

Improved Online Learning Algorithms for CTR Prediction in Ad Auctions

Deep Reinforcement Learning for Sponsored Search Real-time Bidding

Leveraging (Biased) Information: Multi-armed Bandits with Offline Data

Infer Your Enemies and Know Yourself, Learning in Real-Time Bidding with Partially Observable Opponents

Robustly Improving Bandit Algorithms with Confounded and Selection Biased Offline Data: A Causal Approach

Multi-armed bandits for bid shading in first-price real-time bidding auctions

Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising.

Learning to Bid Optimally and Efficiently in Adversarial First-price Auctions

Adversarial Constrained Bidding Via Minimax Regret Optimization with Causality-Aware Reinforcement Learning

Learning Robust Search Strategies Using a Bandit-Based Approach

Socially-Optimal Mechanism Design for Incentivized Online Learning