Abstract:This study investigates the development of an optimal execution strategy through reinforcement learning, aiming to determine the most effective approach for traders to buy and sell inventory within a limited time frame. Our proposed model leverages input features derived from the current state of the limit order book.
To simulate this environment and overcome the limitations associated with relying on historical data, we utilize the multi-agent market simulator ABIDES, which provides a diverse range of depth levels within the limit order book.
We present a custom MDP formulation followed by the results of our methodology and benchmark the performance against standard execution strategies. Our findings suggest that the reinforcement learning-based approach demonstrates significant potential.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: within a limited time frame, how to develop an optimal trading execution strategy through Reinforcement Learning (RL) to determine the most effective method for traders to buy and sell inventories. Specifically, the research aims to optimize the execution process of large - scale trading, reduce the impact on market prices, and minimize trading costs.
### Problem Background
In financial markets, a key challenge faced by financial institutions such as banks, asset management companies, hedge funds, and proprietary trading firms is how to optimize the execution of large - scale positions. Research shows that large - scale transactions can affect asset prices because the immediate depth of the market is limited, and a single large order may exhaust all current buyers or sellers. Therefore, it is usually recommended to split large orders into multiple small pieces for trading. In addition, the nuances in portfolio adjustment may lead to unfavorable price movements, forcing traders to find a balance between fast trading and possible poor execution, or slow trading and exposure to unpredictable market fluctuations.
### Research Objectives
The goal of the paper is to develop an optimal execution strategy based on Reinforcement Learning, using the state characteristics of the current Limit Order Book (LOB) as input, and training the model to achieve the following goals:
1. **Minimize market impact**: Minimize the impact on market prices through reasonable trading speed and timing selection.
2. **Reduce trading costs**: Include direct trading fees and indirect costs (such as slippage and opportunity costs).
3. **Ensure execution efficiency**: Complete the trading task within the specified time to avoid additional penalties caused by unfinished transactions.
### Solutions
To achieve these goals, the author adopts the following methods:
- **Multi - agent market simulator ABIDES**: Used to simulate the trading environment, providing multiple depth levels of the limit order book, overcoming the limitations of relying on historical data.
- **Custom MDP (Markov Decision Process) modeling**: Formalize the trading execution problem into an MDP, and define state, action, and reward functions.
- **Reinforcement learning algorithm DQN (Deep Q - Network)**: Used to train agents so that they can make optimal decisions during the trading process.
### Main Contributions
The main contribution of the paper lies in demonstrating the potential of Reinforcement Learning methods in solving the optimal execution problem, especially in high - dimensional and complex environments where traditional analytical methods are difficult to handle, Reinforcement Learning provides effective solutions. Experimental results show that, compared with traditional execution strategies, Reinforcement Learning - based strategies can manage market impact more effectively while maintaining lower trading costs.
### Formula Summary
The key formulas involved in the paper are as follows:
- **Average liquidation price**:
\[
P_n=\text{Average liquidation price}
\]
- **Total liquidation cost**:
\[
\min_{x\in A}E\left[\sum_{k = 0}^{N}P_kx_k\right]
\]
where \(A=\left\{\{x_0,x_1,\ldots,x_N\}\in\mathbb{R}^{N + 1}_+;\sum_{k = 0}^{N}x_k=X_0\right\}\).
- **Time - weighted average price (TWAP)**:
\[
\text{TWAP}=\frac{X_0}{N}\sum_{k = 0}^{N}P_k
\]
Through these formulas and methods, the paper successfully solves the key problems in large - scale trading execution, providing new ideas for future financial trading strategies.