EarnHFT: Efficient Hierarchical Reinforcement Learning for High Frequency Trading

Molei Qin,Shuo Sun,Wentao Zhang,Haochong Xia,Xinrun Wang,Bo An
2023-09-22
Abstract:High-frequency trading (HFT) uses computer algorithms to make trading decisions in short time scales (e.g., second-level), which is widely used in the Cryptocurrency (Crypto) market (e.g., Bitcoin). Reinforcement learning (RL) in financial research has shown stellar performance on many quantitative trading tasks. However, most methods focus on low-frequency trading, e.g., day-level, which cannot be directly applied to HFT because of two challenges. First, RL for HFT involves dealing with extremely long trajectories (e.g., 2.4 million steps per month), which is hard to optimize and evaluate. Second, the dramatic price fluctuations and market trend changes of Crypto make existing algorithms fail to maintain satisfactory performance. To tackle these challenges, we propose an Efficient hieArchical Reinforcement learNing method for High Frequency Trading (EarnHFT), a novel three-stage hierarchical RL framework for HFT. In stage I, we compute a Q-teacher, i.e., the optimal action value based on dynamic programming, for enhancing the performance and training efficiency of second-level RL agents. In stage II, we construct a pool of diverse RL agents for different market trends, distinguished by return rates, where hundreds of RL agents are trained with different preferences of return rates and only a tiny fraction of them will be selected into the pool based on their profitability. In stage III, we train a minute-level router which dynamically picks a second-level agent from the pool to achieve stable performance across different markets. Through extensive experiments in various market trends on Crypto markets in a high-fidelity simulation trading environment, we demonstrate that EarnHFT significantly outperforms 6 state-of-art baselines in 6 popular financial criteria, exceeding the runner-up by 30% in profitability.
Trading and Market Microstructure
What problem does this paper attempt to address?
The paper primarily proposes a new solution for high-frequency trading (HFT) in the cryptocurrency market. Specifically, it attempts to address the following two main issues: 1. **Low data efficiency due to extremely long time spans**: In HFT, decisions need to be made every second, resulting in a very long time span (e.g., approximately 2.4 million steps per month). This leads to low data efficiency in reinforcement learning (RL) training, requiring more data to converge, thereby increasing the demand for computational resources. 2. **Strategy performance degradation due to drastic market changes**: The cryptocurrency market is highly volatile, with frequent changes in market trends. In such cases, agents trained on historical data often fail to maintain good trading performance when faced with significant market trend changes. To address these challenges, the paper proposes a new framework called EarnHFT (Efficient Hierarchical Reinforcement Learning method for High-Frequency Trading). This framework consists of three stages: - **Stage I**: By constructing a Q-teacher (i.e., the optimal action value calculated based on dynamic programming) to assist in training the second-level RL agent, thereby improving its performance and training efficiency. - **Stage II**: In this stage, the authors trained a large number of second-level RL agents with different market trend preferences and selected a small subset of the best-performing agents to form a strategy pool. - **Stage III**: A minute-level router was trained, which can dynamically select second-level agents from the strategy pool to achieve stable performance under different market conditions. Through extensive experimental validation, EarnHFT performed excellently under different market trends, significantly outperforming six state-of-the-art baseline methods, achieving at least a 30% improvement in profitability across six popular financial metrics.