Causality Analysis Between Time Series - A Rigorous Approach
X. San Liang
2014-01-01
Abstract:In their recent book, O'Neil and Schutt (2013) said: "One of the biggest statistical challenges, from both a theoretical and practical perspective, is establishing a causal relationship between two variables." Given two time series, can one tell, in a faithful and quantitative way, the cause and effect between them? Based on a recently established rigorous formalism of information flow, namely, the Liang-Kleeman information flow (see Liang, 2013, for a review), Liang (2014) arrives at a formula that gives this important and challenging question a positive answer. Here causality is measured by the time rate of change of information flowing from one variable, say, X-2 to another, X-1. If the evolution of (X-1, X-2) is governed bydX(1) = F(1)dt + b(11)dW(1) + b(12)dW(2), (1)dX(2) = F(2)dt + b(21)dW(1) + b(22)dW(2), (2)where W-i (i = 1,2) is white noise, Liang (2008) has established that the flow rate from X-2 to X-1 isT-2 1 = -E (1/rho 1 (F-1 rho 1)/partial derivative x(1)) + 1/2E (1/rho 1 partial derivative(2)(b(11)(2) + b(12)(2))rho 1/partial derivative x(1)(2)), (3)where pi is the marginal probability density of X-1, and E the mathematical expectation. T-2 1 can be zero or nonzero. A nonzero T-2 1 means that X-2 is causal to X-2: a positive value means that X-2 makes X-1 more uncertain, and vice versa. This measure is asymmetric between X-1 and X-2; particularly, if the process underlying X-1 has nothing to do with X-2 then the resulting causality from X-2 to X-1 vanishes. Now the dynamics is actually unknown; instead we are given two time series. In this case, the Liang-Kleeman information flow can be equally obtained, with the terms in (3) replaced by their respective estimators. In a linear setting, it is proved that (Liang, 2014)T-2 1 = C-11 C-12 C-2,C-d1 - C-12(2) C-1,C-d1/C-11(2) C-22 - C-11 C-12(2), (4)where C-ij is the sample covariance between X-i and X-j C-i,C-dj the covariance between (i) and X-j, and X-j the finite difference approximation of dX(j)/dt using the Euler forward scheme. The formula is tight in form, and very easy to compute, in sharp contrast to other information-theoretic approaches. Moreover, statistical significance test can be performed for each estimated T-2 1. As validations we have shown that this formula faithfully unravels the cause-effect relations between several touchstone series (both linear and nonlinear) purportedly generated with one-way causality, while traditional approaches fail in this regard. An example system shown in Liang (2014) is:dX(1) = (-X-1 + 0.5X(2))dt + 0.1dW(1), (5)dX(2) = -X(2)dt + 0.1dW(2.) (6)Clearly X-2 drives X-1 but X-1 does not feedback. With this system we generate a sample path and plot the series in Fig. 1. Application of the above formula yields a pair of flow rates:T-2 1 approximate to 0.11,T-1 2 approximate to 0,a remarkable result that accurately recovers the causality between X-1 and X-2. This study has also been applied to the investigation of real world problems; one example is the cause-effect relation between the two major climate modes, the El Nino and Indian Ocean Dipole, which have been linked to the hazards in a far flung regions of the globe, with important results that would be difficult, if not impossible, to obtain.[GRAPHICS].