Presentation slides (pdf file)
"... the researchers' conclusions regarding a pledge effect suggest a criterion for causality of post hoc ergo propter hoc ("after this, therefore because of this"), a fundamental fallacy of logic, known since classical times, that involves inferring a causal relation on the basis of correlation and temporality alone. "
Appraising Evidence on Program Effectiveness: Do Virginity Pledges Cause Virginity? (original critique)
In a recent publication on appraising evidence on program effectiveness (Constantine & Braverman, 2004), we used the Bearman and Brueckner (2001) virginity pledge study to illustrate the problem of dubious claims of program effectiveness that are erroneously based on non-experimental correlational data. After reviewing the logical and statistical basis of the authors' claims regarding the effect of virginity pledging on delaying sexual intercourse, we concluded that "... in considering the original question--Do virginity pledges cause initiation of sexual intercourse to be delayed?--the answer remains that they might or might not. This particular study adds little or nothing to our knowledge of this wished-for effect." (p.241)
The following text is the full critique excerpted from Constantine and Braverman (2004). For comments and questions about this critique, and our replies, see the critique FAQ.
Causation, Correlation, and Alternative Explanations
The essential nature of causation and the types of evidence necessary to demonstrate a causal relation, such as the effect of a program on an outcome, have been long debated (e.g., Bunge, 1979; Mackie, 1980; McKim and Turner, 1997). In spite of these ongoing debates, it is safe to say that a pragmatic view of causation is most appropriate to intervention effectiveness studies. Most program outcomes of interest are the result of numerous and interacting causes-some that are potentially changeable (such as home environment) and some that are much less so (such as genetic influences). What we expect of the best interventions is to partially influence some outcomes, under specific conditions and circumstances, by modifying one or more causal factors. But how do we know when this has happened?
According to the nineteenth-century philosopher John Stuart Mill, at least three criteria must be invoked in justifying casual claims: (1) association (or correlation-the cause is related to the effect), (2) temporality (the cause comes before the effect), and (3) elimination of plausible alternative explanations (other plausible explanations for an effect are considered and ruled out). The key is that all three are necessary, yet sometimes the second and often the third of these criteria are neglected in the design and interpretation of evaluation studies. And even when the third criterion is explicitly addressed by the evaluation, it is often arguable whether or not a sufficient number of the most likely plausible explanations have been considered.
It is widely recognized that correlation does not necessarily imply causation, yet erroneous causal attributions are commonly made based on association or correlation alone. Consider the potential conclusion that adolescents' levels of psychological attachment to their families are a cause of observed differences in problem behavior levels, based on a correlation between these two variables. Although this conclusion might in fact be valid, the correlation alone does not provide sufficient supportive evidence for its validity; any number of alternative explanations could fit the observed relationship. For example, lower levels of problem behavior might strengthen family attachment rather than the other way around. Or a third factor, such as patterns of parental conflict, might independently influence both attachment and problem behavior.
When both of Mill's first two conditions (association and temporality) hold, it can be even more tempting to erroneously infer causation without considering other plausible explanations. As an example, consider the National Longitudinal Study of Adolescent Health (commonly known as the Add Health study) (Resnick and others, 1997) . This large correlational study has yielded uncountable associations among adolescent behaviors, background conditions, health outcomes, and other factors. And because it was longitudinal, involving linked measurements over time from the same participants, some of these associations have been examined for the temporality expected for a cause-and-effect relation. Yet little effort has been invested in addressing the third critical criterion for causality: identifying and ruling out plausible alternative explanations.
A compelling illustration is provided by a widely publicized Add Health study conclusion that virginity pledge programs "cause virginity," that is, delay initiation of sexual intercourse (Bearman and Brueckner, 2001). Complex statistical methods, such as survival analysis and logistic regression, were used to reach this conclusion. Several qualifications regarding the program setting were appropriately discussed, most notably that to have an effect, the pledge must occur in a community of other pledgers that is neither too small nor too large relative to the total student population in the school. The authors, however, neglected sufficient consideration of plausible alternative explanations. Foremost among these would be the possibility that a pre-existing disinclination to initiate sex might have been a primary causal factor behind both signing the pledge and delaying intercourse. If true, this alternative explanation implies that signing the virginity pledge serves as a marker to identify those youth who delay intercourse for any number of other reasons and that, in the absence of pledging, the pattern of sexual initiation would be largely unchanged. This alternative arises from the likelihood of a strong self-selection effect, meaning that participants determine for themselves whether they will be part of the intervention group (in this case, those who pledged) or the control group (those who did not). It is likely that pre-existing differences between those who chose to pledge and those who didn't-most notably differences in the intent to delay intercourse-are not only related to the intervention group assignment but are arguably among its most important determinants.
A statistical adjustment procedure intended to remove the effect of self-selection was described in an appendix to the article, but this procedure was both logically and statistically inadequate (see Pedhazur and Schmelkin, 1991, pp. 295-296 for a discussion on the futility of this type of adjustment). Instead, the researchers' conclusions regarding a pledge effect suggest a criterion for causality of post hoc ergo propter hoc ("after this, therefore because of this"), a fundamental fallacy of logic, known since classical times, that involves inferring a causal relation on the basis of correlation and temporality alone.
This study, nevertheless, has generated extensive media coverage and policy discussion (see, for example, Boyle, 2000; Nesmith, 2001; Schemo, 2001; Willis, 2001) and has had a substantial influence on federal policy about sexuality education. Prior to this study, the U.S. Department of Health and Human Services had required as performance measures for the evaluation of federally-funded abstinence education programs "the proportion of program participants who have engaged in sexual intercourse" and the birth rate of female program participants (Federal Register, 2000). Two years later, on the heels of extensive media attention to Bearman and Brueckner's (2001) study, these sexual behavior and birth rate measures were replaced with the "proportion of youth who commit to abstain from sexual activity until marriage" (Department of Health and Human Services, 2002). Thus, virginity pledging has become the primary behavioral outcome to be measured.
If one reads the various critiques and summaries of the pledge study and its conclusions, it is remarkable to find no mention of the obvious plausible alternative explanation of a pre-existing disinclination among pledgers. Instead, the critiques tend to focus on the limited conditions under which the intervention is believed to be effective and the negative side effects observed (for example, that pledgers who break the pledge were less likely to use contraception than nonpledgers). Yet in considering the original question--Do virginity pledges cause initiation of sexual intercourse to be delayed?--the answer remains that they might or might not. This particular study adds little or nothing to our knowledge of this wished-for effect.
-- (from pages 239-241)
Experimental and Quasi-Experimental Designs
The pledge study example sets the stage for a brief review of experimental and quasi-experimental study designs. A more comprehensive overview of this topic is provided by Reichardt and Mark (1998), and a definitive coverage can be found in Shadish, Cook, and Campbell (2002). The critical missing design element in the pledge study was a controlled manipulation, that is, random or other controlled assignment of the pledge program to some schools or classrooms and not others. Random assignment would be characteristic of a true experimental design, whereas nonrandom assignment strategies could be part of a quasi-experimental design. By contrast, the virginity pledge study design was purely correlational, in that no manipulation of intervention delivery across schools, classrooms, or other units took place. With a good experimental or quasi-experimental design, the plausible alternative explanation for the virginity effect could have been ruled out or rendered unlikely, and then the potential effectiveness of pledging could have been examined more appropriately. The admonition "no causation without manipulation" (commonly attributed to Paul Holland) might be somewhat exaggerated for effect, yet it is a useful heuristic for raising a red flag whenever one encounters claims of program effects based on self-selected participation in an intervention program. Correlational designs do have a variety of appropriate and important uses, such as developing hypotheses to be tested in subsequent studies. However, their utility in eliminating plausible alternative explanations is limited.
-- (from page 241-242)
Transparency and Accessibility
A preliminary draft of the virginity pledge study discussed earlier was released by the authors in July, 2000--six months prior to its official publication date (January 2001) in the American Journal of Sociology. However, this printed journal was not mailed to libraries or subscribers until June of 2001, creating, in effect, a one-year interval in which the report was extensively discussed in the national media and among policy advocates on both sides of the issue, yet without access to the final published version of the article. Many discussants were limited to commenting on the press release or the few selected details reported in the media reports.
-- (from page 250)
Bearman, P. S. & Brueckner, H. (2001). Promising the future: Virginity pledges and first intercourse. American Journal of Sociology, 106, 859-912.
Constantine, N. A & Braverman, M.T. (2004). Appraising evidence on program effectiveness. In M.T. Braverman, N. A. Constantine, and J. K. Slater (Eds.), Foundations and evaluation: Contexts and practices for effective philanthropy. (pp. 236-258). San Francisco: Jossey-Bass
Other references cited can be found in the book, Foundations and Evaluation.
For comments and questions about this critique, and our replies, see the critique discussion.