Social media sentiment, model uncertainty, and volatility forecasting

Steven Lehrer,Tian Xie,Xinyu Zhang
DOI: https://doi.org/10.1016/j.econmod.2021.105556
IF: 4.7
2021-09-01
Economic Modelling
Abstract:<p>Many economic indicators including consumer confidence indices used to forecast volatility or macroeconomic outcomes, are published with a considerable time lag. To obtain a timelier measure of consumer sentiment many central bank and economic researchers are turning towards using state-of-the-art text sentiment analysis tools. We examine if there are benefits for forecasting volatility from (i) incorporating a sentiment measure derived using deep learning from Twitter messages at the one-minute level, and (ii) acknowledging specification uncertainty of the lag index in the heterogeneous autoregression (HAR) model. We present evidence from an out of sample forecasting exercise that suggests including social media sentiment can significantly improve the forecasting accuracy of a popular volatility index, particularly in short time horizons. Further, our results document large gains in predictive accuracy from a newly proposed estimator that allows for model uncertainty in the specification of the lag index when using a HAR estimator.</p>
economics
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is how to use sentiment indicators extracted from Twitter messages to improve volatility prediction, and propose a new model - averaging estimation method when dealing with the uncertainty of the volatility lag index. Specifically, the paper focuses on the following two aspects: 1. **Introduction of social media sentiment indicators**: - Many economic indicators, such as the consumer confidence index, have a large time lag when used to predict volatility or macro - economic outcomes. In order to obtain a more timely measure of consumer sentiment, many central banks and economic researchers have turned to using advanced text sentiment analysis tools. - This paper explores the benefits of incorporating 1 - minute - level sentiment indicators extracted from Twitter messages based on deep learning into the volatility prediction model. 2. **Dealing with model uncertainty and the uncertainty of the lag index**: - This paper proposes a new model - averaging estimator (MAHAR), which can handle the uncertainty of the lag index in the heterogeneous autoregressive (HAR) model. - Through the model - averaging method, this estimator can better cope with the complementary effects among different models, thereby improving the prediction accuracy. ### Main contributions 1. **Empirical analysis**: - Through out - of - sample prediction experiments, this paper provides evidence that incorporating social media sentiment indicators can significantly improve the accuracy of volatility prediction, especially in short - term prediction. - The results show that the newly proposed MAHAR estimator is superior to other HAR - type estimators in terms of prediction accuracy. 2. **Theoretical contributions**: - A new method for dealing with the uncertainty of the lag index is proposed, namely the model - averaging HAR (MAHAR) estimator. - This method can not only handle the aggregation problem of high - frequency data, but also consider model uncertainty in the model selection process. ### Method overview 1. **HAR model**: - Use the HAR model proposed by Corsi (2009) to approximate the VIX index. The HAR model assumes that the daily volatility \( y_{t + h} \) of h - step - ahead can be expressed as: \[ y_{t + h}=\beta_0+\beta_d y_t^{(1)}+\beta_w y_t^{(5)}+\beta_m y_t^{(22)}+\epsilon_{t + h} \] where \( y_t^{(l)} \) represents the average of the previous l periods, defined as: \[ y_t^{(l)}=\frac{1}{l}\sum_{s = 1}^{l - 1}y_{t - s} \] 2. **Model - averaging HAR (MAHAR)**: - A new model - averaging estimator is proposed, which improves the prediction accuracy by combining the prediction results of multiple candidate models. - The form of the MAHAR estimator is: \[ \hat{y}_{T + 1}(w)=w^\top\hat{y}_{T + 1}=\sum_{m = 1}^M w_m\hat{y}_{m, T + 1} \] where \( w \) is a weight vector, satisfying \( w\in\mathcal{W}=\{w\in[0, 1]^M:\sum_{m = 1}^M w_m = 1\}\). 3. **Monte Carlo simulation**: - The performance of the MAHAR method in a finite sample is studied through Monte Carlo simulation. - The simulation results show that the MAHAR method performs well under different sample sizes and prediction time horizons, especially in short - term prediction. ### Conclusion This paper significantly improves the accuracy of volatility prediction by introducing social media sentiment indicators and proposing a new model - averaging estimator. In particular, the MAHAR estimator performs excellently in dealing with the uncertainty of the lag index, providing a powerful tool for financial risk management.