Volatility Prediction using Financial Disclosures Sentiments with Word Embedding-based IR Models

Navid Rekabsaz,Mihai Lupu,Artem Baklanov,Allan Hanbury,Alexander Duer,Linda Anderson
DOI: https://doi.org/10.18653/v1/P17-1157
2017-09-28
Abstract:Volatility prediction--an essential concept in financial markets--has recently been addressed using sentiment analysis methods. We investigate the sentiment of annual disclosures of companies in stock markets to forecast volatility. We specifically explore the use of recent Information Retrieval (IR) term weighting models that are effectively extended by related terms using word embeddings. In parallel to textual information, factual market data have been widely used as the mainstream approach to forecast market risk. We therefore study different fusion methods to combine text and market data resources. Our word embedding-based approach significantly outperforms state-of-the-art methods. In addition, we investigate the characteristics of the reports of the companies in different financial sectors.
Information Retrieval,Computational Engineering, Finance, and Science
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to use the sentiment information in corporate annual reports to predict the volatility of the stock market. Specifically, the authors have studied how to use the information retrieval (IR) model based on word embeddings to analyze the sentiment of corporate annual disclosure documents and combine it with market data to improve the accuracy of volatility prediction. The following are the key points of the paper: 1. **Problem Background**: - Volatility in financial markets is an important risk indicator and has a significant impact on the stability of companies. - In recent years, more and more studies have begun to use sentiment analysis methods to predict volatility, especially by analyzing text resources such as corporate financial reports, news, forum posts, and earnings conference calls. 2. **Research Objectives**: - Explore whether the sentiment information in corporate annual disclosure documents (especially the "Risk Factors" section in 10 - K reports) can be used to predict the volatility of the stock market. - Research how to apply the IR model based on word embeddings to sentiment analysis to improve the accuracy of prediction. - Explore different fusion methods to combine text information and market data to further improve the prediction effect. - Analyze the sentiment characteristics of reports in different financial industry sectors and discuss the differences in prediction performance in specific fields. 3. **Main Contributions**: - Propose an extended method of the IR model based on word embeddings, which significantly improves the accuracy of volatility prediction. - Verify through experiments that the method of combining text and market data has a better prediction effect in the long - term window. - Discover that there are significant differences in the sentiment characteristics of reports in different financial industry sectors, but the general - purpose model performs better than the industry - specific models. 4. **Methods and Technologies**: - Use word embedding technology to expand the financial vocabulary and improve the traditional IR model weight scheme. - Conduct regression analysis through support vector machines (SVM) and artificial neural networks (ANN) to evaluate the effects of different feature combinations. - Introduce multiple feature fusion methods, including early fusion, late fusion, and multi - kernel learning (MKL). 5. **Experimental Results**: - Based on the 10 - K report data set from 2012 to 2015, the experimental results show that using the extended BM25 weight scheme and the stacking fusion method has the best prediction performance. - The method of combining text and market data shows a more stable prediction effect in multiple time windows. - There are significant differences in the prediction performance of different financial industry sectors, but the general - purpose model performs better overall. In conclusion, this paper successfully improves the accuracy of predicting the volatility of the stock market using the sentiment information in corporate annual reports by introducing the IR model based on word embeddings and multiple feature fusion methods.