Forecasting Cryptocurrencies Log-Returns: a LASSO-VAR and Sentiment Approach

Federico D'Amario,Milos Ciganovic
DOI: https://doi.org/10.48550/arXiv.2210.00883
2022-09-22
Abstract:Cryptocurrencies have become a trendy topic recently, primarily due to their disruptive potential and reports of unprecedented returns. In addition, academics increasingly acknowledge the predictive power of Social Media in many fields and, more specifically, for financial markets and economics. In this paper, we leverage the predictive power of Twitter and Reddit sentiment together with Google Trends indexes and volume to forecast the log returns of ten cryptocurrencies. Specifically, we consider $Bitcoin$, $Ethereum$, $Tether$, $Binance Coin$, $Litecoin$, $Enjin Coin$, $Horizen$, $Namecoin$, $Peercoin$, and $Feathercoin$. We evaluate the performance of LASSO-VAR using daily data from January 2018 to January 2022. In a 30 days recursive forecast, we can retrieve the correct direction of the actual series more than 50% of the time. We compare this result with the main benchmarks, and we see a 10% improvement in Mean Directional Accuracy (MDA). The use of sentiment and attention variables as predictors increase significantly the forecast accuracy in terms of MDA but not in terms of Root Mean Squared Errors. We perform a Granger causality test using a post-double LASSO selection for high-dimensional VARs. Results show no "causality" from Social Media sentiment to cryptocurrencies returns
Statistical Finance,Machine Learning,Econometrics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to use social media sentiment and search engine data to predict the logarithmic returns of cryptocurrencies. Specifically, the authors used sentiment data from Twitter and Reddit, as well as Google Trends indices and trading volumes, to predict the logarithmic returns of ten cryptocurrencies (including Bitcoin, Ethereum, Tether, Binance Coin, Litecoin, Enjin Coin, Horizen, Namecoin, Peercoin, and Feathercoin) through the LASSO - VAR model. The main purpose of the study was to evaluate whether these sentiment and attention variables can improve the accuracy of prediction, especially in terms of directional prediction (i.e., predicting the direction of price increases or decreases). ### Main research questions: 1. **Predictive ability of sentiment and attention variables**: The study aims to verify the effectiveness of social media sentiment and search engine data as predictors, especially in predicting the logarithmic returns of cryptocurrencies. 2. **Model performance comparison**: By comparing with benchmark models (such as LBVAR and FAVAR), evaluate the performance of the LASSO - VAR model in predicting the logarithmic returns of cryptocurrencies. 3. **Granger causality test**: The study also explores the Granger causality between social media sentiment data and cryptocurrency returns to determine whether the sentiment data has predictive ability for returns. ### Research methods: - **Data collection**: From January 2018 to January 2022, daily data of ten cryptocurrencies were collected, including Google Trends indices, sentiment data from Twitter and Reddit, and trading volumes. - **Model construction**: Use the LASSO - VAR model for prediction, and select the best regularization parameter λ through time - series cross - validation. - **Performance evaluation**: Use root - mean - square error (RMSE) and directional prediction accuracy (MDA) to evaluate the prediction performance of the model. - **Granger causality test**: Use high - dimensional Granger causality test to analyze the causal relationship between social media sentiment data and cryptocurrency returns. ### Main findings: - **Prediction performance**: The LASSO - VAR model outperforms the benchmark models in terms of directional prediction accuracy (MDA), with an average improvement of 10%. Especially for stablecoins (such as USDT), the performance of the homoscedastic model is comparable to that of FGLS - VAR. - **The role of sentiment data**: After adding sentiment and attention variables, the prediction accuracy is significantly improved, but there is no obvious improvement in root - mean - square error (RMSE). - **Granger causality**: The research results show that there is no Granger causality between social media sentiment data and cryptocurrency returns, but Granger causality is found among different cryptocurrencies. In conclusion, by introducing social media sentiment and search engine data, this paper successfully improves the directional prediction accuracy of the logarithmic returns of cryptocurrencies, but fails to find a direct causal relationship between sentiment data and returns.