Mean--Variance Portfolio Selection by Continuous-Time Reinforcement Learning: Algorithms, Regret Analysis, and Empirical Study

Yilie Huang,Yanwei Jia,Xun Yu Zhou
2024-12-08
Abstract:We study continuous-time mean--variance portfolio selection in markets where stock prices are diffusion processes driven by observable factors that are also diffusion processes yet the coefficients of these processes are unknown. Based on the recently developed reinforcement learning (RL) theory for diffusion processes, we present a general data-driven RL algorithm that learns the pre-committed investment strategy directly without attempting to learn or estimate the market coefficients. For multi-stock Black--Scholes markets without factors, we further devise a baseline algorithm and prove its performance guarantee by deriving a sublinear regret bound in terms of Sharpe ratio. For performance enhancement and practical implementation, we modify the baseline algorithm into four variants, and carry out an extensive empirical study to compare their performance, in terms of a host of common metrics, with a large number of widely used portfolio allocation strategies on S\&P 500 constituents. The results demonstrate that the continuous-time RL strategies are consistently among the best especially in a volatile bear market, and decisively outperform the model-based continuous-time counterparts by significant margins.
Portfolio Management,Machine Learning,Systems and Control,Optimization and Control
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in the continuous trading market, how to use the Reinforcement Learning (RL) method to select the optimal portfolio to achieve the mean - variance (MV) efficient allocation. Specifically, the paper focuses on how an investor, who can only observe stock prices and market factors but has no knowledge of the specific model parameters of the market, can directly learn the optimal investment strategy from data through the RL algorithm without attempting to estimate or learn market coefficients. ### Core Problems of the Paper 1. **Dynamic Mean - Variance Portfolio Selection**: Traditionally, the mean - variance framework is mainly used for static (single - period) portfolio selection, and applying these static strategies in a dynamic environment is inefficient. This paper studies how to perform dynamic mean - variance portfolio selection in continuous time. 2. **Unknown Market Coefficients**: Most existing methods rely on accurate estimation of asset return moments, especially the expected return, which is very difficult and error - prone in practice. This paper proposes a method that does not require estimating market coefficients. 3. **Application of Reinforcement Learning**: Using reinforcement learning theory, especially the continuous - time RL theory for diffusion processes, to directly learn the optimal investment strategy from data, thus avoiding the estimation error and sensitivity problems in traditional methods. ### Main Contributions 1. **Proposing RL Algorithms**: Based on the continuous - time RL theories of Wang et al. (2020) and Jia and Zhou (2022a, b), RL algorithms applicable to the MV problem are proposed. These algorithms generate new learning data by solving moment conditions under certain martingale conditions. 2. **Theoretical Guarantees and Regret Analysis**: For the multi - dimensional Black - Scholes environment (without factors), a baseline algorithm is designed, and its convergence and sub - linear regret bound with respect to the Sharpe ratio are proven. This is the first model - free regret analysis for continuous - time MV portfolio selection. 3. **Empirical Research**: Through extensive empirical research on the S&P 500 constituent stocks, the performance of the proposed RL strategy is compared with that of 15 other popular methods. The results show that the RL strategy outperforms the classic model - driven continuous - time methods in all indicators, especially in volatility and bear markets. ### Formula Examples Some of the key formulas involved in the paper include: - Wealth change equation: \[ dx(u(t))=\sum_{i = 1}^{d}u_i(t)\frac{dS_i(t)}{S_i(t)}-e^T u(t)\frac{dS_0(t)}{S_0(t)} \] - Mean - variance optimization problem: \[ \min_u \text{Var}(x(u(T))) \] \[ \text{subject to } E[x(u(T))]=z \] - Entropy - regularized objective function: \[ E\left[-\left(x^{\pi}(T)-w\right)^2+\gamma\int_0^T\log\pi(u^{\pi}(t)|t,x^{\pi}(t),F(t))dt\right]-(w - z)^2 \] Through these methods, the paper aims to provide a more robust and effective solution for dynamic portfolio selection, especially when the market coefficients are unknown.