Abstract:Deep reinforcement learning (DRL) has been widely studied in the portfolio management task. However, it is challenging to understand a DRL-based trading strategy because of the black-box nature of deep neural networks. In this paper, we propose an empirical approach to explain the strategies of DRL agents for the portfolio management task. First, we use a linear model in hindsight as the reference model, which finds the best portfolio weights by assuming knowing actual stock returns in foresight. In particular, we use the coefficients of a linear model in hindsight as the reference feature weights. Secondly, for DRL agents, we use integrated gradients to define the feature weights, which are the coefficients between reward and features under a linear regression model. Thirdly, we study the prediction power in two cases, single-step prediction and multi-step prediction. In particular, we quantify the prediction power by calculating the linear correlations between the feature weights of a DRL agent and the reference feature weights, and similarly for machine learning methods. Finally, we evaluate a portfolio management task on Dow Jones 30 constituent stocks during 01/01/2009 to 09/01/2021. Our approach empirically reveals that a DRL agent exhibits a stronger multi-step prediction power than machine learning methods.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenge of interpreting deep reinforcement learning (DRL) strategies in portfolio management tasks. Due to the black - box nature of deep neural networks, it has become very difficult to understand the trading strategies underlying DRL. The authors propose an empirical method to interpret the strategies of DRL agents in portfolio management tasks, which is achieved through the following steps: 1. **Reference Model**: Use an ex - post linear model as a reference model, which assumes prior knowledge of actual stock returns to find the optimal portfolio weights. Specifically, use the coefficients of the ex - post linear model as reference feature weights. 2. **Feature Weight Definition**: For DRL agents, use Integrated Gradients to define feature weights, which are the coefficients of the linear regression model between rewards and features. 3. **Quantifying Predictive Ability**: Study the ability of single - step and multi - step prediction, and quantify the predictive ability by calculating the linear correlation between the feature weights of DRL agents and the reference feature weights, which also applies to traditional machine learning methods. 4. **Empirical Evaluation**: During the period from January 1, 2009 to September 1, 2021, use the 30 components of the Dow Jones for the evaluation of portfolio management tasks. The results show that DRL agents are superior to traditional machine learning methods in multi - step prediction ability. ### Main Contributions - Propose a new empirical method to understand the strategies of DRL agents in portfolio management tasks. - Use Integrated Gradients to define the feature weights of DRL agents. - Quantify the predictive ability by calculating the linear correlation between feature weights. - Empirical results show that DRL agents perform better in multi - step prediction ability, thus achieving better trading performance. ### Method Overview 1. **Feature Weights**: Quantify the relationship between input features and rewards (i.e., portfolio returns). 2. **Reference Feature Weights**: Use the coefficients of the ex - post linear model as reference feature weights. 3. **Feature Weights of DRL Agents**: Use Integrated Gradients to define feature weights. 4. **Comparison of Predictive Ability**: Compare the ability of single - step and multi - step prediction by calculating linear correlation. ### Experimental Results - **Trading Performance**: DRL agents (especially agents using the PPO algorithm) perform best in terms of annualized return rate and Sharpe ratio. - **Explanation Analysis**: The performance of DRL agents in multi - step prediction ability is significantly better than that of traditional machine learning methods, while the opposite is true in single - step prediction ability. ### Conclusion The paper reveals the advantages of DRL agents in portfolio management tasks through empirical methods, especially in multi - step prediction ability. Future work will explore interpretation methods for other deep reinforcement learning algorithms and study their applications in other financial applications, such as trading, hedging, and risk management.

Explainable Deep Reinforcement Learning for Portfolio Management: An Empirical Approach

Explainable Post hoc Portfolio Management Financial Policy of a Deep Reinforcement Learning agent

A Deep Reinforcement Learning Model for Portfolio Management Incorporating Historical Stock Prices and Risk Information

Combining Transformer based Deep Reinforcement Learning with Black-Litterman Model for Portfolio Optimization

A Deep Reinforcement Learning Approach for Portfolio Management in Non‐Short‐Selling Market

A Novel Anti-Risk Method for Portfolio Trading Using Deep Reinforcement Learning

XPM: an Explainable Deep Reinforcement Learning Framework for Portfolio Management

PortPortfolio Management Based on Deep Reinforcement Learning Method with Data Augment

Deep Reinforcement Learning Model for Stock Portfolio Management Based on Data Fusion

A Deep Reinforcement Learning Framework For Financial Portfolio Management

Deep Reinforcement Learning Approach to Portfolio Optimization in the Australian Stock Market

DeepTrader: A Deep Reinforcement Learning Approach for Risk-Return Balanced Portfolio Management with Market Conditions Embedding

Adversarial Deep Reinforcement Learning in Portfolio Management

A Deep Residual Shrinkage Neural Network-based Deep Reinforcement Learning Strategy in Financial Portfolio Management

Dynamic Optimization of Portfolio Allocation Using Deep Reinforcement Learning

Model-based Deep Reinforcement Learning for Dynamic Portfolio Optimization

Evaluation of Deep Reinforcement Learning Algorithms for Portfolio Optimisation

Dynamic Graph-based Deep Reinforcement Learning with Long and Short-term Relation Modeling for Portfolio Optimization.

Evaluation of Deep Reinforcement Learning Based Stock Trading.

Practical Deep Reinforcement Learning Approach for Stock Trading

Bridging the gap between Markowitz planning and deep reinforcement learning