Deep Stock Trading: A Hierarchical Reinforcement Learning Framework for Portfolio Optimization and Order Execution

Rundong Wang,Hongxin Wei,Bo An,Zhouyan Feng,Jun Yao
DOI: https://doi.org/10.48550/arXiv.2012.12620
2021-02-07
Abstract:Portfolio management via reinforcement learning is at the forefront of fintech research, which explores how to optimally reallocate a fund into different financial assets over the long term by trial-and-error. Existing methods are impractical since they usually assume each reallocation can be finished immediately and thus ignoring the price slippage as part of the trading cost. To address these issues, we propose a hierarchical reinforced stock trading system for portfolio management (HRPM). Concretely, we decompose the trading process into a hierarchy of portfolio management over trade execution and train the corresponding policies. The high-level policy gives portfolio weights at a lower frequency to maximize the long term profit and invokes the low-level policy to sell or buy the corresponding shares within a short time window at a higher frequency to minimize the trading cost. We train two levels of policies via pre-training scheme and iterative training scheme for data efficiency. Extensive experimental results in the U.S. market and the China market demonstrate that HRPM achieves significant improvement against many state-of-the-art approaches.
Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the impracticality of existing portfolio management methods in actual trading. Specifically, existing reinforcement learning methods usually assume that each asset re - allocation can be completed immediately, thus ignoring price slippage in trading costs. This assumption does not hold in real - trading environments because price slippage is a non - negligible cost factor in actual trading. Moreover, due to the need to balance long - term profit maximization and short - term trading execution, single or flat reinforcement learning algorithms have difficulty handling tasks at different time scales. To solve these problems, the author proposes a Hierarchical Reinforced Portfolio Management (HRPM) system, which optimizes portfolio management and order execution through a hierarchical decision - making process. Specifically, the HRPM system decomposes the trading process into two levels: high - level portfolio management and low - level trading execution, and trains the corresponding strategies respectively. The high - level strategy gives portfolio weights at a lower frequency to maximize long - term profits; the low - level strategy determines the specific quantity and price of buying or selling at a higher frequency within a shorter time window to minimize trading costs. In this way, HRPM aims to more realistically simulate the actual trading environment and improve the effectiveness of portfolio management, especially when trading costs are considered. Experimental results show that the performance of HRPM in the US market and the Chinese market is significantly better than many existing methods.