Presentation slides (pdf file)
"The logical flaw involves the authors' justification for the type of analyses they attempted. We have shown why this justification was inadequate. The statistical flaws involve the authors' implementation of their analyses. We have explained why even if one were to accept the authors' logical argument for their self-selection adjustment analyses, these analyses were incorrectly implemented -- primarily due to a fundamental omission in their adjustment model"
-- from reply to question 11
"Many readers seem to assume that articles published in peer-reviewed journals are scientifically sound, despite much evidence to the contrary. It is important, therefore, that misleading work be identified after publication."
--Altman (JAMA, 2002), quoted in
reply to question 12
Do Virginity Pledges Cause Virginity?
In a recently published critique (2004) we used the Bearman and Brueckner (2001) virginity pledge study to illustrate the problem of dubious claims of program effectiveness that are erroneously based on non-experimental correlational data.
One reason why we believe that this critique and the resulting discussion are important is that proponents of increased federal funding for abstinence-only-until-marriage education programs have made extensive use of this study to justify their cause (for example, see Rector, 2002). This ubiquitous federal funding (P.L. 104-193, Section 510 of the Social Security Act, see Solomon-Fears, 2004, pp. 3-6) requires states to divert matching funds away from comprehensive sexuality education, and all states except California currently participate (Maine, Pennsylvania, and New Jersey have subsequently dropped out of this program).
A letter offering the opportunity to respond to this critique was sent to Peter Bearman in August, 2004. Dr. Bearman provided a spirited reply, but declined permission to post this response. A similar letter was sent to Dr. Brueckner, the article's second author, who declined to respond. We then solicited additional comments and questions from various others, and from these have created the following anonymous FAQ, including our responses to the various questions and issues raised by Dr. Bearman. We reinforce our desire to post attributable commentary from Dr. Bearman or Dr. Brueckner, should they wish to provide this.
The latest question and reply were posted on February 18, 2005 (see question 13 below). A comment by Dr. Michael Resnick of the Add Health study research team was posted on February 28 and appears immediately below. All replies were written by Norm Constantine, first author of the original critique. Thanks to everyone who has submitted comments and questions.
What do you think? Let us know at email@example.com.
Comment from Dr. Michael Resnick, Add Health Study Research Team Member, February 28, 2005 (used with permission)
Wanted to drop a line to let you know how appreciative I am of your careful inquiry into how these analyses are being used to drive agendas that go way above and beyond any statistical findings. Keep it up; what you do matters. In appreciation,
Michael D. Resnick, Ph.D.,
Professor of Pediatrics and Director,
Healthy Youth Development Prevention Research Center,
Division of General Pediatrics and Adolescent Health,
Department of Pediatrics, University of Minnesota
Posted October 4, 2004
Question 1. Isn't it misleading to state that the Bearman and Brueckner article argued that virginity pledges "cause virginity"? Did the authors ever use that exact phrase?
No, they did not use the quoted phrase, nor were the quotes intended to attribute the phrase to them. The sentence you refer to is as follows:
"A compelling illustration is provided by a widely publicized Add Health study conclusion that virginity pledge programs 'cause virginity,' that is, delay initiation of sexual intercourse (Bearman and Brueckner, 2001)." (Constantine & Braverman, 2004, p.240)
The quotes around the phrase "cause virginity" were meant to indicate that it was our phrase, and the citation to Bearman and Brueckner was meant to apply directly to their conclusion at the end of this sentence - delay initiation of sexual intercourse. Our use of the phrase "cause virginity" was somewhat tongue-in-cheek, in fact, we originally had titled the section of the chapter in which it appeared "Causation, Correlation and Alternative Explanations: Do Virginity Pledges Cause Virginity?" However, our copy editor at Jossey-Bass convinced us to shorten this section title to just the first five words, in the interest of brevity.
While we want to be clear for anyone who does not appreciate our tongue-in-cheek phraseology that "cause virginity" is not a direct quote attributed to Bearman and Brueckner, we also argue in the chapter that the authors are indeed making this causality claim -- one which we believe to be both logically and statistically unsound.
Question 2. You argue that the effect observed could be a selection effect, but this could not be true since the effect is totally driven by social context -- there is no effect of taking a pledge when there are no pledgers in the community, and no effect when there are too many.
To state that "the effect is totally driven by social context" requires an assumption that there is an effect, and this is exactly what we dispute. The correlation between pledging and initiation of sexual intercourse might very well vary by social context, however this in no way demonstrates that there is a pledge effect on delaying sexual initiation. That was our major point.
Yes we did observe, and in fact discuss, the qualifications reported that the purported effect was highly dependent on the proportion of co-pledgers in the community. This does not change our argument that no convincing evidence was presented of a pledge effect on initiation. (See also question 13 for a more compelling answer to this question.)
Question 3. You called for a randomized individual-level design, but this would not be able to solve this problem. The experimental context would have to be elevated to communities -- a much more interesting problem than the one you describe.
Actually, we never suggested an individual randomized design, and we did suggest a community-level group randomized or quasi-experimental design, with community defined at the level of the school or classroom:
"The critical missing design element in the pledge study was a controlled manipulation, that is, random or other controlled assignment of the pledge program to some schools or classrooms and not others. Random assignment would be characteristic of a true experimental design, whereas nonrandom assignment strategies could be part of a quasi-experimental design. By contrast, the virginity pledge study design was purely correlational, in that no manipulation of intervention delivery across schools, classrooms, or other units took place. With a good experimental or quasi-experimental design, the plausible alternative explanation for the virginity effect could have been ruled out or rendered unlikely, and then the potential effectiveness of pledging could have been examined more appropriately. The admonition "no causation without manipulation" (commonly attributed to Paul Holland) might be somewhat exaggerated for effect, yet it is a useful heuristic for raising a red flag whenever one encounters claims of program effects based on self-selected participation in an intervention program." (Constantine & Braverman, 2004, p. 241)
Question 4. Why quibble, virginity pledges don't really do any harm, and research results are rarely conclusive anyway, so why not support pledge programs?
We disagree. Programs that give false hope, that compete for funds and time, and that might prevent other more effective programs from being implemented, can cause harm to individuals and communities. True, research results from a single study are rarely conclusive, and we make this point throughout the chapter. But some results are more convincing than others, and over time results across good studies can cumulatively address important questions about program effectiveness to a pragmatically useful degree. The primary conclusions from the Bearman and Brueckner study, we believe, violate basic foundations of logical scientific inference. We do not believe this particular study contributes constructively to the cumulative research pool, nor that its influence on national policy has been harmless.
Question 5. Are you saying that all Add Health (Adolescent Health Study) research is flawed? Were the conservatives right in trying to block funding for this study when it was first proposed?
No to both questions. However, there are many other examples of papers being published based on the Add Health data set - by the Add Health research group and by others - that employ flawed correlational logic and unfounded claims of causal relations, i.e., inferring causation based on correlation and temporality alone. These flaws are not in the data collection, but rather in some of the uses of the data.
Question 6. You dismiss the statistical manipulations performed to adjust for the effects of self-selection as logically and statistically inadequate. But isn't this standard statistical methodology? After all, some programs just can't be randomized.
It is certainly true that some programs just can't be randomized, and we make this point in the chapter. Whether potential effects of virginity pledges on sexual initiation could be evaluated by a randomized study is another question (see question 3 above). Yet, even if a randomized study would be logistically or otherwise impossible for this intervention, this would not in any way justify or validate logically and statistically unsound adjustments of correlational data.
In the chapter we cited Pedhazur and Schmelkin (1991, pp. 295-296) for a discussion on the futility of this type of adjustment. This is well worth reading, but is far from the only discussion on the futility of adjusting for selection effects in non-experimental studies. Pedhazur is an educational researcher and methodologist; epidemiologists say the same thing, for example:
"Observational studies revealed strong apparently protective effects of beta-carotene, but long term RCTs found that it, if anything, beta-carotene increased cardiovascular disease risk. There are now a series of similar examples: hormone replacement therapy, vitamin E and vitamin C intake in relation to cardiovascular disease, or fiber intake in relation to colon cancer among them. What these examples have in common is that the groups of people who were apparently receiving protection from these substances in the observational studies were very different from the groups not using them, on a whole host of characteristics of their lives. Belief that these differences could be summed up in measures of a few "potential confounders" and adequately adjusted for in statistical analyses, fails to recognize the complexity of the reasons why people differ with regard to particular and general characteristics of their lives." (Davey Smith & Ebrahim, 2001, p. 5)
In an early version of the chapter we addressed this issue in more detail in a footnote, however, because the book is intended for a nontechnical audience, our publisher's copy editor asked us to remove this technical footnote, and we complied. The original footnote read as follows:
"Self selection into the pledge group by inclined abstainers was statistically modeled using logistic regression analysis, and the resulting significant predictors of pledging (e.g., religiosity) were then included in the virginity survival analyses. This was intended to eliminate the self-selection bias. This technique is logically questionable, yet even if the logic were accepted it would require an adequately modeled selection relationship to be effective (see e.g., Pedhazur & Schmelkin, 1991, pp. 295-296). To the extent that one or more variables necessary to predicting self-selection are missing from the model, the model is considered misspecified, and leads to biased or totally incorrect estimates of effects. As Pedhazur & Schmelkin (p. 295) warn - 'Analytic techniques, no matter how fancy they may be, cannot salvage a misspecified model.'
Although the total proportion of variance explained by the authors' selection model was, inexplicably, not reported, it is clear that the modeled selection relationship was far from adequate. For example, one of the adjustment variables involved a torturous statistical transformation (percent of same sex pledgers in school, squared, with its regression coefficient inexplicably multiplied by 10, see footnote to table B1 on page 908 of Bearman & Brueckner, 2001), and for this the authors reported an odds ratio confidence interval of .98 - .99 (note that an odds ratio of 1.0 is the definition of no relationship). Yet the most blatant limitation in their modeling of the selection relationship, we contend, was the omission of a reliable and valid measure of intent to initiate sexual intercourse at the time that the pledge was offered. Inclusion of such a measure still would not have perfectly modeled self-selection, yet its omission renders any conclusions about pledge effects illusory, or at best hopelessly confused.
We apologize to our non-statistically trained readers for this digression into arcane statistical issues, and only ask that Light and colleagues (1990) previously discussed admonition be kept in mind - 'you cannot fix with statistics what you bungle in design.'" [end footnote]
Question 7. In your chapter, you accused the virginity study's authors of "post hoc ergo propter hoc." Was it really necessary to resort to Latin profanity to make your point?
This question elicits a chuckle, but it's actually a good question. Of course this Latin phrase is not typically considered profane, although it is understandable that a researcher might be put off by a suggestion of fallacious logic. As we stated in the chapter,
"The researchers' conclusions regarding a pledge effect suggest a criterion for causality of post hoc ergo propter hoc ("after this, therefore because of this"), a fundamental fallacy of logic, known since classical times, that involves inferring a causal relation on the basis of correlation and temporality alone." (Constantine & Braverman, 2004, p. 240).
A more modern designation for this type of logical fallacy is "cargo cult science," (coined by physicist Richard Feyman) but the story behind this phrase takes longer to explain, hence our preference for the Latin terminology.
We believe that it is appropriate and useful for researchers to criticize each other's conclusions. In fact, it is necessary to scientific progress. Sociologist Robert Merton perhaps said it best: "Most institutions demand unqualified faith; but the institution of science makes skepticism a virtue." Richard Feyman and Carl Sagan have made similar statements, as have many other of the most prominent scientists of the past century. Scientific criticism and skepticism are best based in logic and reason, thus a suggestion of post hoc ergo propter hoc, we believe, is a reasonable and appropriate critique, especially when backed up by a logical justification.
Question 8 A replication by an independent research group has been recently published that finds essentially the same results. Doesn't this validate the Bearman and Brueckner study?
Not at all. This new web-published study (Rector, et al, 2004) was conducted by researchers at the Heritage Foundation, whose stated mission is to "formulate and promote conservative public policies." It was not a true replication -- it employed the same data set involving the same adolescents that Bearman and Brueckner analyzed, but including a third wave of data now available, and it modeled Bearman and Brueckner's methodology. The difference is that it reached further in its questionable attributions of causality, claiming that virginity "pledges appear to have a strong and significant effect in encouraging positive and constructive behavior among youth", and providing a longer list of sexual behaviors purportedly affected by pledging. Although their statistical adjustment procedures appear to be somewhat less convoluted than Bearman and Brueckner's, they were equally unsound, both logically and statistically. Thus, the reply to questions 8 and 9 above apply here as well. We will more fully address the flaws in this new study in subsequent writing.
Posted October 6, 2004
Question 9. You have criticized the authors' (Bearman & Brueckner) adjustments for self-selection into the pledge condition as being both logically and statistically inadequate. You discussed the statistical issues in some detail, which frankly will sound like finger nails scraping across a blackboard to most people. However, I don't see where you've specifically addressed how the statistical adjustment for self-selection was logically inadequate.
Good point, a statistical criticism of complex statistical manipulations is difficult to explain in plain language. But the logical criticism should be more readily accessible.
First, to review, the statistical problem with the attempted adjustment for self-selection is that the regression model is misspecified, mainly because it leaves out a critical predictor variable for self-selection into the virginity pledge group, that it, a valid measure of preexisting propensity to remain a virgin. Other statistical problems were noted as well (see question 8). These statistical issues are fundamental problems that alone would invalidate the authors' conclusions, however, it is interesting to also look separately at the logical problems of adjusting for self-selection.
The logical problem is as follows - even if a more adequately specified model were to be employed to adjust for self-selection, one would then be simulating a contrived, unlikely, and illogical condition by attempting to statistically remove this propensity to remain a virgin from the self-selected pledgers. In other words, the research question then becomes, "would the pledge be effective if these particular self-selected pledgers had not been already inclined to avoid sex?" Think about this for a minute.
As Lord (1967, p. 305; quoted in Pedhazur & Schmelkin, 1991) cautioned, "With the data usually available for such studies, there is simply no logical or statistical procedure that can be counted on to make proper allowances for uncontrolled preexisting differences between groups. The researcher wants to know how the groups would have compared if there had not been preexisting uncontrolled differences. The usual research study of this type is attempting to answer a question that simply cannot be answered in any rigorous way on the basis of the available data."
Or, as Anderson (1963) most succinctly put it: "One may well wonder what exactly it means to ask what the data would look like were they not what they are." Amen.
Although more complex and sophisticated adjustment procedures are now available than those that Lord and Anderson had access to, the logical validity of their (and our) criticisms is not affected.
It might be helpful to note how our logical and statistical criticisms fit together, given that we argue that either one alone invalidates the authors' conclusions. The logical flaw involves the authors' justification for the type of analyses they attempted. We have shown why this justification was inadequate. The statistical flaws involve the authors' implementation of their analyses. We have explained why even if one were to accept the authors' logical argument for their self-selection adjustment analyses, these analyses were incorrectly implemented -- primarily due to a fundamental omission in their adjustment model
Posted October 11, 2004
Question 10. But wasn't the Bearman and Brueckner article published in a peer-reviewed journal? If what you say is true, then how could it have survived the peer review process?
As we mentioned in another part of the chapter (Constantine & Braverman, 2004, pp. 250-251), pre-publication peer review is a notoriously unreliable process. It does, however, serve an important purpose. Maybe one way to put it is that pre-publication peer review is a necessary but not sufficient step in appraising the quality of published research.
Writing in the Journal of the American Medical Association, Altman (2002) recently noted that "Many readers seem to assume that articles published in peer-reviewed journals are scientifically sound, despite much evidence to the contrary. It is important, therefore, that misleading work be identified after publication." In other words, post-publication peer review is a critical component of the peer review process. (An earlier editorial on the same topic (Altman, 1994) is also available.)
Posted November 10, 2004
Question 11. Regarding the virginity pledge, are there alternative explanations that would explain why self selection would only appear statistically if some of the students in the school said they pledge, but not all of them did? For example, at the extreme, if no one at the school pledges, then this probably means they did not have the opportunity to pledge and hence no self selection. If everyone pledges, then this means it was probably a required activity in a required course, and hence no self selection. If just some in the school pledge, then there is the greatest opportunity for self selection. Does this make sense? If so, I think it is a stronger criticism of the claim that the social context findings mean that self selection did not occur.
Yes, absolutely. Thank you for this insightful observation. This is a more compelling answer to question 2 then the answer we provided above.
Another way to view this argument would be to rephrase Dr. Bearman's oft-quoted statement: "Policy makers should recognize that the pledge works because not everyone is pledging" to something of the sort: "Policy makers should recognize that a spurious pledge effect appears when the pledge group consists only of those students who self-select into the pledge condition and are already more likely than non-pledgers to abstain."
Posted January 23, 2005
Question 12. Can you recommend some further reading about the logical fallacies you raise?
For a more comprehensive but very accessible coverage, you might try T. Edward Damer's classic, Attacking Faulty Reasoning: A Practical Guide to Fallacy-Free Arguments.
Posted February 18, 2005
Question 13. Today I read a statement from Peter Bearman in reaction to a study in Texas, where he noted that abstinence-only programs have never been shown to be effective. What's going on here -- is Dr. Bearman repudiating his own research?
Thank you for raising this issue. The statement attributed to Dr. Bearman, as reported in a February, 2005, New Scientist article on abstinence-only programs in Texas, is as follows:"These kinds of programmes have never been shown to be effective," he says. "This is consistent with almost every other study that has ever been done."
Although we commend Dr. Bearman for speaking out in this manner, we agree that this statement appears to be at odds with his earlier conclusion that:"Adolescents who pledge are much less likely to have intercourse than adolescents who do not pledge. The delay effect is substantial." (Bearman & Brueckner, 2001, p. 859)
We can think of three potential explanations for this apparent inconsistency. As usual, we invite Dr. Bearman to add to this discussion.
1. Dr. Bearman might be strictly distinguishing between abstinence-only and virginity pledge programs, in other words he might believe that abstinence-only programs have never been shown to be effective, but that virginity pledge programs have been shown to be effective. This distinction is not tenable. While it is true that some abstinence-only programs do not include virginity pledge components, and that some virginity pledge programs might not be abstinence-only, most virginity pledge programs are indeed abstinence-only, and the two approaches are in fact strongly linked in practice. The Heritage Foundation, for example, explicitly defines virginity pledge programs as "real abstinence (or what is conventionally termed an "abstinence only") program(s); that is, the program does not provide contraceptives or encourage their use." (Rector, 2002)
Furthermore, while no federal funding that we know of is specifically allocated to virginity pledge programs, virginity pledge effectiveness claims have been used as a primary source of evidence by proponents of abstinence-only funding in the policy debates of the last several years. And as we noted in our original critique:
Prior to this [virginity pledge] study, the U.S. Department of Health and Human Services had required as performance measures for the evaluation of federally-funded abstinence education programs "the proportion of program participants who have engaged in sexual intercourse" and the birth rate of female program participants (Federal Register, 2000). Two years later, on the heels of extensive media attention to Bearman and Brueckner's (2001) study, these sexual behavior and birth rate measures were replaced with the "proportion of youth who commit to abstain from sexual activity until marriage" (Department of Health and Human Services, 2002). (Constantine & Braverman, 2004, p. 241)
In other words, virginity pledging is now the primary behavioral outcome measure required for federally funded abstinence-only programs, reinforcing the fundamental connection between the two approaches. Thus, we do not believe that a distinction between abstinence-only and virginity pledge programs is a tenable explanation for the apparent inconsistency between Dr. Bearman's earlier and current statements.
2. It is possible that Dr. Bearman's virginity pledge effectiveness claims were intended to be technically excluded from his current statement through his use of the qualifier "almost" (i.e., the absence of effectiveness findings in the Texas program evaluation was "inconsistent with almost every other study that has ever been done.")
3. It is also possible that Dr. Bearman has reconsidered the evidence and now agrees that his original findings were more likely the result of a self-selection effect than a genuine pledge effect, hence he no longer believes that his study should be cited as evidence of a "substantial delay effect" of virginity pledging. If this is the case, his statement that "these kinds of programmes have never been shown to be effective" would apply without qualification to his own study as well.
What do you think? Let us know at firstname.lastname@example.org.
Further comments and questions will be posted here as appropriate.
Altman, D.G. (1994). The scandal of poor medical research. BMJ, 308, 283-284.
Altman, D.G. (2002). Poor quality medical research: What can journals do? Journal of the American Medical Association, 287, 2765-2767.
Anderson, N. H. (1963). Comparison of different populations: Resistance to extinction and transfer. Psychological Review, 70, 162-179.
Bearman, P. S. & Brueckner, H. (2001). Promising the future: Virginity pledges and first intercourse. American Journal of Sociology, 106, 859-912.
Constantine, N. A & Braverman, M.T. (2004). Appraising evidence on program effectiveness. In M.T. Braverman, N. A. Constantine, and J. K. Slater (Eds.), Foundations and evaluation: Contexts and practices for effective philanthropy. (pp. 236-258). San Francisco: Jossey-Bass
Davey Smith, G. & Ebrahim, S. (2001). Epidemiology -- Is it time to call it a day? International Journal of Epidemiology, 30, 1-11.
Light, R. J., Singer, J. D., & Willett, J. B. (1990). By design: Planning research on higher education. Cambridge, MA: Harvard University Press.
Pedhazur, E. J. & Schmelkin, L. P. (1991). Measurement, design, and analysis: An integrated approach. Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers.
Rector, R. E. (2002). The effectiveness of abstinence education programs in reducing sexual activity among youth. Washington, DC: The Heritage Foundation.
Rector, R. E., Johnson, K. A., & Marshall, J. A. (2004). Teens who make virginity pledges have substantially improved life outcomes. Washington, DC: The Heritage Foundation.
Solomon-Fears, C. (2004) Reducing teen pregnancy: Adolescent family life and abstinence education programs. Washington, DC: Congressional Research Service.
© 2004, 2005 Public Health Institute