SETN: Stock Embedding Enhanced with Textual and Network Information

Takehiro Takayanagi,Hiroki Sakaji,Kiyoshi Izumi
DOI: https://doi.org/10.1109/BigData55660.2022.10020220
2024-08-06
Abstract:Stock embedding is a method for vector representation of stocks. There is a growing demand for vector representations of stock, i.e., stock embedding, in wealth management sectors, and the method has been applied to various tasks such as stock price prediction, portfolio optimization, and similar fund identifications. Stock embeddings have the advantage of enabling the quantification of relative relationships between stocks, and they can extract useful information from unstructured data such as text and network data. In this study, we propose stock embedding enhanced with textual and network information (SETN) using a domain-adaptive pre-trained transformer-based model to embed textual information and a graph neural network model to grasp network information. We evaluate the performance of our proposed model on related company information extraction tasks. We also demonstrate that stock embeddings obtained from the proposed model perform better in creating thematic funds than those obtained from baseline methods, providing a promising pathway for various applications in the wealth management industry.
Computation and Language,Computational Engineering, Finance, and Science
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper aims to address a significant issue in stock quantitative analysis: how to extract the relative relationships between stocks from textual and network information and generate high-quality stock embeddings. Specifically, the authors propose a new method—Stock Embedding with Text and Network information (SETN)—to improve various tasks in the wealth management domain, such as stock price prediction, portfolio optimization, and similar fund identification. ### Background and Motivation 1. **Importance of Stock Embeddings**: - Stock embeddings are a method of representing stocks as vectors, which can quantify the relative relationships between stocks. - Stock embeddings have wide applications in the wealth management domain, including stock price prediction, portfolio optimization, and similar fund identification. 2. **Limitations of Existing Methods**: - Existing stock embedding methods mainly rely on a single type of data, such as textual data or network data. - Although these methods perform well on certain tasks, they fail to fully leverage the complementary advantages of textual and network information. ### Proposed Method 1. **SETN Model**: - The SETN model combines a pre-trained Transformer model and a Graph Neural Network (GNN) model to handle textual and network information, respectively. - By jointly training these two models, SETN can more comprehensively capture the multi-dimensional information of stocks. 2. **Specific Steps**: - **Step 1**: Extract the subgraph of the target node using network data. - **Step 2**: Use a domain-adaptive pre-trained Transformer model to extract textual information from company annual reports. - **Step 3**: Use a GNN model to capture network information. - **Step 4**: Input the embeddings of the target stock into a classifier for industry and sector classification. ### Experiments and Results 1. **Experimental Setup**: - The dataset includes 2,437 companies from the Japanese stock market, divided into training, validation, and test sets. - The textual data used comes from company annual reports, and the network data comes from causal chains. 2. **Evaluation Metrics**: - Mean Average Precision at K (MAP@K) is used to evaluate the performance of the related company information extraction task. - The model's performance on specific themes is evaluated through the thematic fund creation task. 3. **Experimental Results**: - The SETN model significantly outperforms baseline models in the related company information extraction task. - In the thematic fund creation task, the SETN model also performs excellently, better extracting stocks with similar themes. ### Main Contributions 1. **Proposing the SETN Model**: - A stock embedding method that combines textual and network information, significantly improving the performance of related company information extraction tasks. 2. **Comparative Study**: - A comparative study of different types of graph structures and learning architectures, validating the advantages of directed graphs and joint training. 3. **Application Prospects**: - By introducing the thematic fund creation task, the paper demonstrates the application potential of the SETN model in the wealth management domain. ### Conclusion By proposing the SETN model, this paper successfully addresses the issue of extracting high-quality stock embeddings from textual and network information. Experimental results show that the SETN model performs excellently in both related company information extraction and thematic fund creation tasks, providing new pathways for applications in the wealth management domain.