Abstract:Deep reinforcement learning (DRL) can be used to extract deep features that can be incorporated into reinforcement learning systems to enable improved decision-making; DRL can therefore also be used for managing stock portfolios. Traditional methods cannot fully exploit the advantages of DRL because they are generally based on real-time stock quotes, which do not have sufficient features for making comprehensive decisions. In this study, in addition to stock quotes, we introduced stock financial indices as additional stock features. Moreover, we used Markowitz mean-variance theory for determining stock correlation. A three-agent deep reinforcement learning model called Collaborative Multi-agent reinforcement learning-based stock Portfolio management System (CMPS) was designed and trained based on fused data. In CMPS, each agent was implemented with a deep Q-network to obtain the features of time-series stock data, and a self-attention network was used to combine the output of each agent. We added a risk-free asset strategy to CMPS to prevent risks and referred to this model as CMPS-Risk Free (CMPS-RF). We conducted experiments under different market conditions using the stock data of China Shanghai Stock Exchange 50 and compared our model with the state-of-the-art models. The results showed that CMPS could obtain better profits than the compared benchmark models, and CMPS-RF was able to accurately recognize the market risk and achieved the best Sharpe and Calmar ratios. The study findings are expected to aid in the development of an efficient investment-trading strategy.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is in stock portfolio management, how to use deep reinforcement learning (DRL) technology combined with data fusion methods to improve the quality of investment decisions, thereby achieving a higher return on investment and effectively managing market risks. Specifically, the paper proposes a stock portfolio management system (CMPS) based on multi - agent deep reinforcement learning. By introducing stock financial indices as additional features and combining Markowitz mean - variance theory to determine stock correlations, the decision - making ability of the model is enhanced. In addition, the paper also proposes a risk - free asset strategy (CMPS - RF) to balance risks and returns and further improve the model's risk management ability in the case of large market fluctuations.
### Main Contributions
1. **Proposing the CMPS Model**: A collaborative three - agent DRL model (CMPS) is constructed. Each agent uses the DQN structure to extract different features, and all features are combined through a self - attention network to achieve a comprehensive reward.
2. **Data Fusion Method**: In the CMPS model, a data fusion method is used to obtain additional features. In particular, the financial reports of stocks are combined with real - time trading information, enriching the state of DRL agents and improving long - term prediction ability.
3. **Embedding Stock Correlations**: Stock correlations are embedded based on Markowitz mean - variance theory, providing more information for the model and helping to discover the relationships between financial assets.
4. **Risk Control Strategy**: A risk - free asset strategy (CMPS - RF) is proposed. By balancing risks and risk - free stocks through weights, investment risks are effectively avoided when the stock market fluctuates greatly.
5. **Experimental Verification**: Stock data of the Shanghai Stock Exchange (SSE) 50 Index in China is used to generate two data sets, respectively reflecting the stable and downward situations of the stock market. Five state - of - the - art models are evaluated through six financial indicators. The experimental results show that the CMPS model is superior to other models, and the CMPS - RF model has significant advantages in risk management.
### Method Overview
- **State Representation**: The state \( s_t \) of the CMPS model includes a market quotation vector \( X_t \), a financial index tensor \( F_t \), and a stock correlation tensor \( C_t \).
- \( X_t \): A market quotation vector within the time range \([t - l, t]\), with a shape of \((M + 1, l, N_p)\), where \( M \) is the number of stocks, \( l \) is the size of the time window, and \( N_p \) is the dimension of market quotations.
- \( F_t \): A financial index tensor within the time range \([t - l, t]\), with a shape of \((M, l, N_f)\), where \( N_f \) is the number of features of the financial index.
- \( C_t \): A stock correlation tensor within the time range \([t - l, t]\), with a shape of \((M + 1, M + 1)\), represented by a return covariance matrix.
- **Action Representation**: The action \( a_t \) of the CMPS model represents the way in which agents conduct stock trading according to the current state. The action is discrete and only involves buying, holding, and selling stocks, corresponding to discrete values - 1, 0, and 1 respectively. The overall action \( a_t \) is expressed as follows:
\[
a_t=\{a_1^t, a_2^t, \cdots, a_K^t\}
\]
\[
a_i^t = \{a_{i,1}^t, a_{i,2}^t, \cdots, a_{i,M}^t\}
\]
\[
a_{i,c}^t=\{-1, 0, 1\}
\]
where \( K \) is the number of agents and \( M \) is the number of stocks.
- **Reward Mechanism**: The reward \( R_t \) is fed back to the agents by the environment and is used to guide the agents to optimize their strategies. The ultimate goal is to maximize the cumulative reward \( G_t \).
### Experimental Results
The paper proves through experiments that the CMPS model performs better than other benchmark models in multiple financial indicators. In particular, in the case of large market fluctuations, the CMPS - RF model performs excellently in risk management and return balance.