Abstract:We explore brokerage between traders in an online learning framework. At any round $t$, two traders meet to exchange an asset, provided the exchange is mutually beneficial. The broker proposes a trading price, and each trader tries to sell their asset or buy the asset from the other party, depending on whether the price is higher or lower than their private valuations. A trade happens if one trader is willing to sell and the other is willing to buy at the proposed price. Previous work provided guidance to a broker aiming at enhancing traders' total earnings by maximizing the gain from trade, defined as the sum of the traders' net utilities after each interaction. In contrast, we investigate how the broker should behave to maximize the trading volume, i.e., the total number of trades. We model the traders' valuations as an i.i.d. process with an unknown distribution. If the traders' valuations are revealed after each interaction (full-feedback), and the traders' valuations cumulative distribution function (cdf) is continuous, we provide an algorithm achieving logarithmic regret and show its optimality up to constant factors. If only their willingness to sell or buy at the proposed price is revealed after each interaction ($2$-bit feedback), we provide an algorithm achieving poly-logarithmic regret when the traders' valuations cdf is Lipschitz and show that this rate is near-optimal. We complement our results by analyzing the implications of dropping the regularity assumptions on the unknown traders' valuations cdf. If we drop the continuous cdf assumption, the regret rate degrades to $\Theta(\sqrt{T})$ in the full-feedback case, where $T$ is the time horizon. If we drop the Lipschitz cdf assumption, learning becomes impossible in the $2$-bit feedback case.

What problem does this paper attempt to address?

The paper primarily explores the problem of trade matching within an online learning framework, specifically focusing on how to maximize the trading volume. Specifically, the study focuses on the process of asset exchange between two traders through a broker. In each trade, the broker proposes a trading price, and if this price is favorable to both parties (i.e., the seller is willing to sell at this price and the buyer is willing to buy at this price), the trade occurs. The contributions of the paper are mainly divided into the following aspects: 1. **Theoretical Background and Objectives**: - The research background sets up an online learning scenario where traders' valuations are considered as an independent and identically distributed (i.i.d.) process with an unknown distribution. - Previous research mainly focused on guiding broker behavior by maximizing trading revenue, i.e., maximizing the total net utility after the trade. - The goal of this paper is to study how to maximize the trading volume, i.e., the total number of trades that occur. 2. **Research Setup**: - An online learning protocol is defined where each round of trading involves two traders appearing with their private valuations, and the broker proposes a trading price. If the price lies between the two valuations, the trade occurs. - The study considers two types of feedback: full feedback (the broker can learn the traders' valuations) and 2-bit feedback (the broker can only learn whether the traders are willing to trade at the proposed price). 3. **Main Results**: - In the full feedback scenario, if the cumulative distribution function (CDF) of traders' valuations is continuous, the paper provides an algorithm with a logarithmic regret rate and proves that this is optimal. - In the 2-bit feedback scenario, if the CDF of traders' valuations is Lipschitz continuous, the paper provides an algorithm with a regret rate close to logarithmic and proves that this is almost the best possible result. - The paper also discusses the situation where relaxing the assumptions about the CDF of traders' valuations leads to worse regret rates. 4. **Technical Challenges and Solutions**: - The paper proposes a key "median lemma," which states that to maximize the trading volume, the broker should set a price as close as possible to the median of the traders' valuation distribution. - For the full feedback scenario, an algorithm based on the empirical median is designed and its effectiveness is proven. - For the 2-bit feedback scenario, using additional information and the intuitive insights from Lemma 1, a binary search algorithm is designed, achieving better regret rates than existing methods. 5. **Related Work**: - The paper reviews related research in the field of bilateral trade, particularly studies from the perspectives of game theory and approximation, as well as recent studies on bilateral trade in online learning settings. In summary, this paper deeply explores the bilateral trade problem by introducing a new objective function (maximizing trading volume) and new algorithm designs, providing theoretical guarantees.

Trading Volume Maximization with Online Learning

An Online Learning Theory of Brokerage

A Contextual Online Learning Theory of Brokerage

Online Learning and Pricing for Multiple Products with Reference Price Effects

Feature-Based Online Bilateral Trade

Fair Online Bilateral Trade

No-Regret Learning in Bilateral Trade via Global Budget Balance

Online Learning in Betting Markets: Profit versus Prediction

Strategic Learning and Trading in Broker-Mediated Markets

Online Learning in Supply-Chain Games

Online Learning for Equilibrium Pricing in Markets under Incomplete Information

Optimal execution in high-frequency trading with Bayesian learning

Online Learning with Feedback Graphs: Beyond Bandits

Selling Joint Ads: A Regret Minimization Perspective

No-Regret Learning for Stackelberg Equilibrium Computation in Newsvendor Pricing Games

Optimizing Discount & Reputation Trade-Offs in E-Commerce Systems: Characterization and Online Learning

Multi-scale Online Learning and its Applications to Online Auctions

Online Learning and Profit Maximization from Revealed Preferences

Model-Free Online Learning in Unknown Sequential Decision Making Problems and Games

Combining Deep Learning on Order Books with Reinforcement Learning for Profitable Trading

Online Learning of Optimal Bidding Strategy in Repeated Multi-Commodity Auctions