Abstract:A number of recent studies in the economics literature have focused on the usefulness of factor models in the context of prediction using "big data". We add to this literature by analyzing whether "big data" are useful for modelling low frequency macroeconomic variables such as unemployment, inflation and GDP. In particular, we analyze the predictive benefits associated with the use of principal component analysis (PCA), independent component analysis (ICA), and sparse principal component analysis (SPCA). We also evaluate machine learning, variable selection and shrinkage methods, including bagging, boosting, ridge regression, least angle regression, the elastic net, and the non-negative garotte. Our approach is to carry out a forecasting "horse-race" using prediction models constructed using a variety of model specification approaches, factor estimation methods, and data windowing methods, in the context of the prediction of 11 macroeconomic variables relevant for monetary policy assessment. In many instances, we find that various of our benchmark models, including autoregressive (AR) models, AR models with exogenous variables, and (Bayesian) model averaging, do not dominate specifications based on factor-type dimension reduction combined with various machine learning, variable selection, and shrinkage methods (called "combination" models). We find that forecast combination methods are mean square forecast error (MSFE) "best" for only 3 of 11 variables when the forecast horizon, h=1, and for 4 variables when h=3 or 12. Additionally, non-PCA type factor estimation methods yield MSFE-best predictions for 9 of 11 variables when h=1, although PCA dominates at longer horizons. Interestingly, we also find evidence of the usefulness of combination models for approximately 1/2 of our variables, when h>1. Most importantly, we present strong new evidence of the usefulness of factor based dimension reduction, when utilizing "big data" for macroeconometric forecasting.

Dynamic Sparse Factor Analysis

The sparse dynamic factor model: a regularised quasi-maximum likelihood approach

sparseDFM: An R Package to Estimate Dynamic Factor Models with Sparse Loadings

Sparse Asymptotic PCA: Identifying Sparse Latent Factors Across Time Horizon

Sparse Bayesian factor analysis when the number of factors is unknown

Fast Bayesian Factor Analysis via Automatic Rotations to Sparsity

Estimation of Sparsity-Induced Weak Factor Models

Bayesian factor-adjusted sparse regression

Bayesian Dynamic Factor Models for High-dimensional Matrix-valued Time Series

Path and Direction Discovery in Individual Dynamic Factor Models: A Regularized Hybrid Unified Structural Equation Modeling with Latent Variable

Mining Big Data Using Parsimonious Factor, Machine Learning, Variable Selection and Shrinkage Methods

Emergent and Spontaneous Computation of Factor Relationships from A Large Factor Set

Dynamic Factor Models for Multivariate Count Data: An Application to Stock-Market Trading Activity

Disentangling Structural Breaks in Factor Models for Macroeconomic Data

Econometric Analysis of Large Factor Models

Dynamic Poisson Factorization

State-Varying Factor Models of Large Dimensions

Deep Markov Spatio-Temporal Factorization

Comparisons of Four Methods for Estimating a Dynamic Factor Model

Quasi Maximum Likelihood Estimation and Inference of Large Approximate Dynamic Factor Models via the EM algorithm

Modeling High-Dimensional Time Series: A Factor Model With Dynamically Dependent Factors and Diverging Eigenvalues