Potential Updates to Cornfield's 1959 ‘principles of Research’
Donald B. Rubin
DOI: https://doi.org/10.1002/sim.5431
2012-01-01
Statistics in Medicine
Abstract:This article by Jerome Cornfield is remarkably insightful, especially considering when it was written (published in 1959) [1]; my reading of the literature of that time concerning causal inference suggests that there was considerable confusion about the topic, some of which is still present today, sad to say. Because some of the passages are vividly written, I will be more liberal than usual in quoting them directly. One of the most resonant to me is ‘The degree of articulation of a field is measured by the extent to which the phenomena with which the field is concerned are potentially capable of being explained and predicted in terms of a small number of fundamental concepts and constants.’ When I was in high school (class of 1961), I loved physics, and when I entered Princeton to study physics under John Wheeler, this desire for a limited number of fundamental concepts was a given, as was the attitude that stating results, either mathematical or scientific, using unnecessary assumptions was bad and reflected a lack of real understanding. Much of what appears to be currently acceptable in many of our journals is the opposite of this, new jargon for already defined concepts and new ‘principles’ for already stated results, typically cluttered with extraneous notation conveying no insight. This style is not good mathematics or good science or good statistics, and does not help our field communicate with other fields, for example, in social science or medicine. Cornfield goes on to imply that a field ‘. . . should be ultimately reducible to a small number of fundamental principles and constants, any research not pointed either directly or indirectly at the elucidation of these principles must be considered misguided, and justifiable, if at all, only by short term considerations.’ Right! Perhaps these attitudes were in the air at the time in the same way that randomization was in the air around 1923, as I have commented in Rubin [2]. There is one place where Cornfield seems to fall prey, at least slightly, to the confusion regarding causal inference present in much epidemiology of the time (and today in some circles), which I attribute to the lack of appreciation for the role of potential outcomes [3] to define causal effects in all situations, not just in the context of randomized experiments [4]. This occurs when he answers a hypothetical question ‘If cigarettes are carcinogenic, why don’t all smokers develop lung cancer?’ by talking rather vaguely about probabilities rather than starting by precisely defining the question being asked. Fundamental to me is defining all causal questions by describing the real or hypothetical interventions that would, for example, lead people to smoke or not smoke, and worrying about the plausibility of the stable unit-treatment-value assumption [5], for the resulting potential outcomes defined by these interventions. If these interventions are so amorphous that the associated potential outcomes under each treatment intervention are not conceptually well-defined functions for each unit and each treatment, then the stable unit-treatment-value assumption is not plausible, and we cannot proceed without greater clarity. This is a point I have been making repeatedly for decades [6–8], more recently in Rubin [9] and in my discussion in the work by Imai et al. in JRSS-A, 2012 [10]. Another place where the explicit introduction of potential outcomes would have allowed Cornfield’s discussion to be sharper is when he considers a population of units, only some of which are included in the sample to be randomly divided into treatment groups. Nevertheless, his discussion is still superior to much of what I read today about figuring out the role of ‘mediating variables’. His discussion of topics such as ‘blinding’, again, would have been sharper had he had access to the concept of potential outcomes so that he could have precisely defined the mathematical assumptions that blinding is designed to justify—‘exclusion restrictions’ as explicated in Angrist et al. [11] and Hirano et al. [12]. I was particularly interested in Cornfield’s discussion of the ‘retrospective’ or ‘case–control’ study. Again, I think the discussion would have been more precise with the use of potential outcomes, as in Holland and Rubin [13], but the use of potential outcomes outside the context of randomized experiments was still more than a decade away [4]. For example, there are two distinct issues underlying the analysis of such data: the critical assumption of a hypothetical unconfounded treatment assignment mechanism and the fact that the sampling mechanism is explicitly confounded because it uses observed values of 2778