Explainable Deep Reinforcement Learning for Portfolio Management: An Empirical Approach

Mao Guan,Xiao-Yang Liu
DOI: https://doi.org/10.48550/arXiv.2111.03995
2021-12-19
Abstract:Deep reinforcement learning (DRL) has been widely studied in the portfolio management task. However, it is challenging to understand a DRL-based trading strategy because of the black-box nature of deep neural networks. In this paper, we propose an empirical approach to explain the strategies of DRL agents for the portfolio management task. First, we use a linear model in hindsight as the reference model, which finds the best portfolio weights by assuming knowing actual stock returns in foresight. In particular, we use the coefficients of a linear model in hindsight as the reference feature weights. Secondly, for DRL agents, we use integrated gradients to define the feature weights, which are the coefficients between reward and features under a linear regression model. Thirdly, we study the prediction power in two cases, single-step prediction and multi-step prediction. In particular, we quantify the prediction power by calculating the linear correlations between the feature weights of a DRL agent and the reference feature weights, and similarly for machine learning methods. Finally, we evaluate a portfolio management task on Dow Jones 30 constituent stocks during 01/01/2009 to 09/01/2021. Our approach empirically reveals that a DRL agent exhibits a stronger multi-step prediction power than machine learning methods.
Portfolio Management,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenge of interpreting deep reinforcement learning (DRL) strategies in portfolio management tasks. Due to the black - box nature of deep neural networks, it has become very difficult to understand the trading strategies underlying DRL. The authors propose an empirical method to interpret the strategies of DRL agents in portfolio management tasks, which is achieved through the following steps: 1. **Reference Model**: Use an ex - post linear model as a reference model, which assumes prior knowledge of actual stock returns to find the optimal portfolio weights. Specifically, use the coefficients of the ex - post linear model as reference feature weights. 2. **Feature Weight Definition**: For DRL agents, use Integrated Gradients to define feature weights, which are the coefficients of the linear regression model between rewards and features. 3. **Quantifying Predictive Ability**: Study the ability of single - step and multi - step prediction, and quantify the predictive ability by calculating the linear correlation between the feature weights of DRL agents and the reference feature weights, which also applies to traditional machine learning methods. 4. **Empirical Evaluation**: During the period from January 1, 2009 to September 1, 2021, use the 30 components of the Dow Jones for the evaluation of portfolio management tasks. The results show that DRL agents are superior to traditional machine learning methods in multi - step prediction ability. ### Main Contributions - Propose a new empirical method to understand the strategies of DRL agents in portfolio management tasks. - Use Integrated Gradients to define the feature weights of DRL agents. - Quantify the predictive ability by calculating the linear correlation between feature weights. - Empirical results show that DRL agents perform better in multi - step prediction ability, thus achieving better trading performance. ### Method Overview 1. **Feature Weights**: Quantify the relationship between input features and rewards (i.e., portfolio returns). 2. **Reference Feature Weights**: Use the coefficients of the ex - post linear model as reference feature weights. 3. **Feature Weights of DRL Agents**: Use Integrated Gradients to define feature weights. 4. **Comparison of Predictive Ability**: Compare the ability of single - step and multi - step prediction by calculating linear correlation. ### Experimental Results - **Trading Performance**: DRL agents (especially agents using the PPO algorithm) perform best in terms of annualized return rate and Sharpe ratio. - **Explanation Analysis**: The performance of DRL agents in multi - step prediction ability is significantly better than that of traditional machine learning methods, while the opposite is true in single - step prediction ability. ### Conclusion The paper reveals the advantages of DRL agents in portfolio management tasks through empirical methods, especially in multi - step prediction ability. Future work will explore interpretation methods for other deep reinforcement learning algorithms and study their applications in other financial applications, such as trading, hedging, and risk management.