GRU-PFG: Extract Inter-Stock Correlation from Stock Factors with Graph Neural Network

Yonggai Zhuang,Haoran Chen,Kequan Wang,Teng Fei
2024-11-28
Abstract:The complexity of stocks and industries presents challenges for stock prediction. Currently, stock prediction models can be divided into two categories. One category, represented by GRU and ALSTM, relies solely on stock factors for prediction, with limited effectiveness. The other category, represented by HIST and TRA, incorporates not only stock factors but also industry information, industry financial reports, public sentiment, and other inputs for prediction. The second category of models can capture correlations between stocks by introducing additional information, but the extra data is difficult to standardize and generalize. Considering the current state and limitations of these two types of models, this paper proposes the GRU-PFG (Project Factors into Graph) model. This model only takes stock factors as input and extracts inter-stock correlations using graph neural networks. It achieves prediction results that not only outperform the others models relies solely on stock factors, but also achieve comparable performance to the second category models. The experimental results show that on the CSI300 dataset, the IC of GRU-PFG is 0.134, outperforming HIST's 0.131 and significantly surpassing GRU and Transformer, achieving results better than the second category models. Moreover as a model that relies solely on stock factors, it has greater potential for generalization.
Computational Finance,Artificial Intelligence
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve two main challenges faced in stock prediction models: 1. **Limitations of existing models**: - **Models relying only on stock factors (such as GRU, ALSTM, etc.)**: Although these models have standardized input data and are easy to generalize, they rely only on stock factors for prediction and lack effective capture of the inter - relationships between stocks, resulting in limited prediction effectiveness. - **Models combining multi - source information (such as HIST, TRA, etc.)**: These models improve prediction accuracy by introducing additional information such as industry information, market trading data, and public sentiment. However, such models face difficulties in data standardization, information lag, and challenges in dealing with new companies and emerging industries. 2. **Extracting correlations between stocks**: - The paper points out that the correlations between stocks contain a great deal of information helpful for prediction. Existing stock - factor - based models fail to fully mine these correlations, and while models combining multi - source information can capture some correlations, their complexity and data acquisition difficulties limit their wide application. To solve the above problems, the paper proposes a new stock prediction model - **GRU - PFG (Project Factors into Graph)**. This model uses only stock factors as input and extracts the correlations between stocks through Graph Neural Network (GNN), thus achieving prediction performance comparable to or even better than models combining multi - source information without relying on additional information. ### Main contributions of the model: 1. **Innovatively proposing a more effective stock prediction model**: GRU - PFG performs well among models relying only on Alpha360 stock factors and outperforms previous models. 2. **Deeply mining the information in Alpha360 factors**: Compared with models such as ALSTM and GRU, GRU - PFG can more profoundly extract the information in stock factors. 3. **Having stronger generalization ability compared with multi - source information models**: Experimental results show that GRU - PFG performs better than multi - source information models such as HIST on the CSI300 dataset, and because it relies only on stock factors, it has greater generalization potential. ### Formula summary: - **Stock change rate formula**: \[ \alpha=\frac{P_{t + 1}-P_t}{P_t} \] - **Daily stock return formula**: \[ \beta=\frac{P_{\text{close}}-P_{\text{open}}}{P_{\text{open}}} \] - **Pearson correlation coefficient formula**: \[ R_{xy}=\frac{\sum_{i = 1}^n(F_{xi}-F_x)(F_{yi}-F_y)}{\sqrt{\sum_{i = 1}^n(F_{xi}-F_x)^2}\sqrt{\sum_{i = 1}^n(F_{yi}-F_y)^2}} \] - **Final feature fusion formula**: \[ F_{\text{last}}=W_cX + W_dR_1X+W_eX_{\text{hid}}+W_fR_2X_{\text{hid}} \] - **Prediction formula**: \[ p_t=W_lF_{\text{last}}+b_l \] - **Loss function (mean square error MSE)**: \[ L=\frac{1}{|S_t|}\sum_{t\in D}\sum_{i\in S_t}(p_t^i - g_t^i)^2 \] Through these improvements, the GRU - PFG model not only achieves a significant improvement in prediction performance but also has stronger generalization ability and higher practicality.