HIST: A Graph-based Framework for Stock Trend Forecasting via Mining Concept-Oriented Shared Information

Wentao Xu,Weiqing Liu,Lewen Wang,Yingce Xia,Jiang Bian,Jian Yin,Tie-Yan Liu
DOI: https://doi.org/10.48550/arXiv.2110.13716
2022-01-20
Abstract:Stock trend forecasting, which forecasts stock prices' future trends, plays an essential role in investment. The stocks in a market can share information so that their stock prices are highly correlated. Several methods were recently proposed to mine the shared information through stock concepts (e.g., technology, Internet Retail) extracted from the Web to improve the forecasting results. However, previous work assumes the connections between stocks and concepts are stationary, and neglects the dynamic relevance between stocks and concepts, limiting the forecasting results. Moreover, existing methods overlook the invaluable shared information carried by hidden concepts, which measure stocks' commonness beyond the manually defined stock concepts. To overcome the shortcomings of previous work, we proposed a novel stock trend forecasting framework that can adequately mine the concept-oriented shared information from predefined concepts and hidden concepts. The proposed framework simultaneously utilize the stock's shared information and individual information to improve the stock trend forecasting performance. Experimental results on the real-world tasks demonstrate the efficiency of our framework on stock trend forecasting. The investment simulation shows that our framework can achieve a higher investment return than the baselines.
Statistical Finance,Machine Learning
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve several key problems in stock trend prediction: 1. **Dynamic Correlation**: - Existing methods usually assume that the connections between stocks and concepts are static when constructing Graph Neural Network (GNN) models, that is, the information propagation pattern is fixed. However, in the actual financial market, the correlation between a stock and different concepts changes dynamically. For example, during the COVID - 19 pandemic, Amazon's correlation with its "e - commerce" concept increased significantly, while its correlation with "cloud computing" was relatively weak. - Formula representation: Let \( \alpha_{k,i}^t \) represent the correlation degree of stock \( i \) to concept \( k \) at time \( t \), then dynamic correlation means that \( \alpha_{k,i}^t \) changes over time. 2. **Mining of Hidden Concepts**: - Existing methods mainly rely on predefined concepts (such as industries, businesses, etc.), ignoring the existence of hidden concepts. These hidden concepts may reflect the common features among stocks that have not been predefined by human experts, such as temporary associations due to public health emergencies (such as the COVID - 19 pandemic). - Formula representation: Let \( H_k \) represent hidden concept \( k \), and \( u_k^t \) represent its representation at time \( t \), then the process of mining hidden concepts can be represented as: \[ u_k^t=\text{LeakyReLU}\left(W_u\sum_{i\in M_k^t}\gamma_{k,i}^t x_i^{t,1}+b_u\right) \] where \( \gamma_{k,i}^t \) is the cosine similarity between stock \( i \) and hidden concept \( k \). 3. **Fusion of Shared and Individual Information**: - The trend of a stock is affected not only by the information shared with other stocks but also by its own individual information. Existing methods often overlook this point, resulting in poor prediction results. - Formula representation: Let \( s_i^t \) represent the shared information of stock \( i \) at time \( t \), and \( y_i^t \) represent its prediction output, then the final prediction result can be represented as: \[ p_i^t = W_p\left(\hat{y}_{i,0}^t+\hat{y}_{i,1}^t+\hat{y}_{i,2}^t\right)+b_p \] where \( \hat{y}_{i,0}^t \), \( \hat{y}_{i,1}^t \) and \( \hat{y}_{i,2}^t \) represent the prediction outputs of the predefined concept module, the hidden concept module and the individual information module respectively. ### Summary To overcome the above problems, the author proposes a graph - based framework (HIST), which can dynamically mine and utilize the shared information of predefined and hidden concepts and combine the individual information of stocks to improve the accuracy of stock trend prediction. The experimental results show that this framework outperforms existing methods on multiple evaluation metrics and also shows a higher rate of return in investment simulations.