Abstract:This study investigates the development of an optimal execution strategy through reinforcement learning, aiming to determine the most effective approach for traders to buy and sell inventory within a limited time frame. Our proposed model leverages input features derived from the current state of the limit order book. To simulate this environment and overcome the limitations associated with relying on historical data, we utilize the multi-agent market simulator ABIDES, which provides a diverse range of depth levels within the limit order book. We present a custom MDP formulation followed by the results of our methodology and benchmark the performance against standard execution strategies. Our findings suggest that the reinforcement learning-based approach demonstrates significant potential.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: within a limited time frame, how to develop an optimal trading execution strategy through Reinforcement Learning (RL) to determine the most effective method for traders to buy and sell inventories. Specifically, the research aims to optimize the execution process of large - scale trading, reduce the impact on market prices, and minimize trading costs. ### Problem Background In financial markets, a key challenge faced by financial institutions such as banks, asset management companies, hedge funds, and proprietary trading firms is how to optimize the execution of large - scale positions. Research shows that large - scale transactions can affect asset prices because the immediate depth of the market is limited, and a single large order may exhaust all current buyers or sellers. Therefore, it is usually recommended to split large orders into multiple small pieces for trading. In addition, the nuances in portfolio adjustment may lead to unfavorable price movements, forcing traders to find a balance between fast trading and possible poor execution, or slow trading and exposure to unpredictable market fluctuations. ### Research Objectives The goal of the paper is to develop an optimal execution strategy based on Reinforcement Learning, using the state characteristics of the current Limit Order Book (LOB) as input, and training the model to achieve the following goals: 1. **Minimize market impact**: Minimize the impact on market prices through reasonable trading speed and timing selection. 2. **Reduce trading costs**: Include direct trading fees and indirect costs (such as slippage and opportunity costs). 3. **Ensure execution efficiency**: Complete the trading task within the specified time to avoid additional penalties caused by unfinished transactions. ### Solutions To achieve these goals, the author adopts the following methods: - **Multi - agent market simulator ABIDES**: Used to simulate the trading environment, providing multiple depth levels of the limit order book, overcoming the limitations of relying on historical data. - **Custom MDP (Markov Decision Process) modeling**: Formalize the trading execution problem into an MDP, and define state, action, and reward functions. - **Reinforcement learning algorithm DQN (Deep Q - Network)**: Used to train agents so that they can make optimal decisions during the trading process. ### Main Contributions The main contribution of the paper lies in demonstrating the potential of Reinforcement Learning methods in solving the optimal execution problem, especially in high - dimensional and complex environments where traditional analytical methods are difficult to handle, Reinforcement Learning provides effective solutions. Experimental results show that, compared with traditional execution strategies, Reinforcement Learning - based strategies can manage market impact more effectively while maintaining lower trading costs. ### Formula Summary The key formulas involved in the paper are as follows: - **Average liquidation price**: \[ P_n=\text{Average liquidation price} \] - **Total liquidation cost**: \[ \min_{x\in A}E\left[\sum_{k = 0}^{N}P_kx_k\right] \] where \(A=\left\{\{x_0,x_1,\ldots,x_N\}\in\mathbb{R}^{N + 1}_+;\sum_{k = 0}^{N}x_k=X_0\right\}\). - **Time - weighted average price (TWAP)**: \[ \text{TWAP}=\frac{X_0}{N}\sum_{k = 0}^{N}P_k \] Through these formulas and methods, the paper successfully solves the key problems in large - scale trading execution, providing new ideas for future financial trading strategies.

Optimal Execution with Reinforcement Learning

Multi-agent reinforcement learning in a realistic limit order book market simulation

Optimal Execution Using Reinforcement Learning

A reinforcement learning approach to optimal execution

Double Deep Q-Learning for Optimal Execution

Optimizing Market Making using Multi-Agent Reinforcement Learning

Towards Generalizable Reinforcement Learning for Trade Execution

Learn Continuously, Act Discretely: Hybrid Action-Space Reinforcement Learning For Optimal Execution

Many learning agents interacting with an agent-based market model

Reinforcement Learning for Optimal Execution when Liquidity is Time-Varying

Deep Deterministic Portfolio Optimization

A reinforcement learning extension to the Almgren-Chriss model for optimal trade execution

Deep Reinforcement Learning for Online Optimal Execution Strategies

Practical Application of Deep Reinforcement Learning to Optimal Trade Execution

Learning Multi-Agent Intention-Aware Communication for Optimal Multi-Order Execution in Finance

Risk-Sensitive Compact Decision Trees for Autonomous Execution in Presence of Simulated Market Response

An Adaptive Dual-level Reinforcement Learning Approach for Optimal Trade Execution

Optimal execution of limit and market orders with trade director, speed limiter, and fill uncertainty

Model-based Deep Reinforcement Learning for Dynamic Portfolio Optimization

Deep differentiable reinforcement learning and optimal trading