“Big Data” Analysis: Putting the Data Cart Before the Modelling Horse?
Graham D. Barr,Theodor J. Stewart,Brian S. Kantor
DOI: https://doi.org/10.1111/jacf.12298
2018-06-01
Journal of Applied Corporate Finance
Abstract:The statistical analysis of very large data sets, so‐called Big Data or Data Analytics, has become enormously popular in Statistical Analysis and Operations Research. In some cases, such as research into the buying habits of online consumers, the results have come quickly and been very significant. Analysis of other data sets, however, is questionable. For example, time‐series based statistical analysis, often under the descriptive envelope of “neural networks” and “data mining,” of stock market and futures prices, sometimes in combination with historical accounting figures such as earnings and cash flows. The appeal is understandable given the availability of share price data and cheap computer processing power. Nevertheless, the notion that historical data form some sort of repeatable pattern over time, and that complex time series or neural network techniques can be then be used to forecast future prices is hard to justify. Economic modeling necessarily needs to factor in human behavior, unlike modeling in the pure sciences. The authors cite Lancaster University Professor Michael Pidd who summarizes six relevant principles: Model simple, think complicated Be parsimonious, start small and add Divide and conquer, avoid mega models Use metaphors, analogies and similarities Do not fall in love with data Model building may feel like muddling through. Economic modeling must recognize three key components: (i) the incorporation of human cognitive understanding and experience of the underlying systems, (ii) the use of data to validate emerging models, and (iii) the role of mathematics to ensure internal coherence and logic. Decision‐makers ought to be very skeptical of models which skimp on any one of these three components. The authors emphasize that, rather than Big Data adding value, per se, people add value by creating models that use it.