School of Public Health
University of California, Berkeley
Theory-Based
Data Analysis in Public Health Research
PB HLTH 298-64 (Group
Study), Spring, 2010
Instructor: Norm Constantine
237 University Hall
Units: 2-4
units depending on projects, by arrangement
When
offered: Tuesdays, 4:00 to
6:00 p.m.
Location: 590-L University Hall
CC# 76679
Participants: Developed for DrPH students, other doctoral and advanced master’s students are
welcome by special arrangement as space permits (max=8)
Course Description
This
seminar is intended to assist students in (1) developing their abilities to critical
appraise the design and interpretation of multivariable analyses of
non-experimental quantitative data, and (2) translating the theory guiding
their own dissertation research into a coherent plan for analysis.
Five
types of analytical strategies for multivariable analysis will be reviewed and
critiqued. These include three atheoretical strategies – bivariate
(unadjusted), simultaneous (standard), and automated (stepwise, perhaps better
classified as an anti-theory strategy), as well as two theory-based strategies
– sequential (hierarchical), and elaborative (exclusionary/inclusive). The
focus of this seminar will be on these last two theory-based strategies, and on
conceptual design and causal inference issues, rather than statistical analysis
issues.
The
group will meet for weekly three-hour sessions. Each session will include (1) discussion
of concepts and issues from the current week’s readings, and (2) discussion and
critique of students’ own research and/or student presented published examples
from their own public health specialty areas. Both elements are priorities of
this course, and as much as possible, the two will be integrated.
The
etiology of this course is similar to that described by sociologist and UCLA
School of Public Health Professor Carol Aneshensel in her preface to our
primary text. We will build on her and her students experiences and insights to
achieve some of the same successes:
“This text was conceived at a student's
preliminary doctoral exam, and its development has been a response to questions
posed by students in the graduate course that eventually grew from this
brainchild. I was struck, not for the first time, by a technically correct
description of a series of statistical techniques that had little relevance to
the theory to be tested in the proposed research. Lest students think they have
been unfairly singled out, I should mention that this disconnection is also
found in the proposals of more seasoned investigators. I was puzzled because
the students who struggled most with their data analysis were uniformly bright,
articulate in their understanding of theory, and, most perplexingly, well
trained in the techniques of multivariate statistics. This pattern suggested
that something was missing from the wav we were training the next generation of
social scientists.
“With considerable chutzpah, I started a
seminar on multivariate data analysis to rectify this problem. I soon
discovered that I had no idea how to teach someone else how to translate his or
her theory into a coherent plan for data analysis. Fortunately students in
those first few years did not seem to realize this chaotic state of confusion
and found that the seminar enabled them to integrate what they had learned over
several years in other courses. I was pleased, of course, and more than ready
to accept the credit for their astute insights. In retrospect, it is clear
students were able to bring their ideas and their analyses closer together
because they presented their analyses to their classmates, responded to their
critiques, and offered the same in exchange. This book is the result of
eavesdropping on their conversations.”
-- Aneshensel (2002)
Prerequisites
This is a conceptual applications seminar and is not
focused on statistical mechanics. To benefit from the seminar it is necessary
to have sufficient understanding of the basic concepts of multiple linear
regression, such as variance and covariance, R2, b and Beta
coefficients, partial and semi-partial correlation, statistical control, regression
assumptions, etc. Prior
completion of one of the following: PH241 Statistical
Analysis of Categorical Data; PH241 Multivariate
Statistic; Educ 275B Data Analysis in Educational Research II;
or other equivalent course that sufficiently covers the fundamentals of
multiple linear and/or logistic regression is expected.
It is also expected that students have developed
dissertation or other research questions and at least begun work on theoretical
frameworks to be able to apply these to the exercises of this class.
Learning Objectives
After
successful completion of this course, students will be able to:
1. Explain the critical
role of theory in public health research, and the fundamental differences
between predictive and explanatory research;
2. Explain and illustrate
the differences between the statistical problem of multicollinearity and
interpretative challenge of overlapping covariance;
3. Distinguish between
different types of multivariable analytic strategies, and explain the
advantages, disadvantages, and appropriate applications for each;
4. Explain and
distinguish between confounding, suppression, mediation, and moderation.
5. Discuss the
underdetermination of theory by available evidence and the Duhem–Quine thesis as they relate to causal inference questions supported
by multivariable analyses;
6. Critically appraise
the design, reporting, and interpretation of published multivariable analyses
in the student’s specialty area of public health; and
7. Develop and implement a coherent theory-based
design, analysis, and report for a theory-based multivariable analysis in the
student’s specialty area of public health.
Student
Responsibilities
1. Thoroughly study all
assigned readings prior to class, and actively and appropriately participate in
all discussions;
2. Identify, present,
and critically discuss published examples from students’ specialty area of
public health, and contribute to the discussion of examples from other
students’ areas;
3. Develop a theory-based
multivariable analysis design based on an appropriate application of a
theory-based strategy, for a research question(s) from the student’s own specialty
area; and
4.
Implement
the above theory-based multivariable analysis design using a relevant data set
and appropriate statistical software. Provide a written report and verbal
presentation to the seminar.
Required
Text
Aneshensel, C.S.
(2002). Theory-based data analysis for
the social sciences. Thousand Oaks, CA: Sage, Pine Forge Press.
Recommended Texts
Introductory review: Allison, P. D. (1999). Multiple
regression: A primer. Thousand Oaks, CA: Sage, Pine Forge Press. (an excellent
review of the mechanics and assumptions of multiple regression analysis, from
the same series as the Aneshensel book. However Allison’s approach is not
theory based, and in some places it is downright theoretically confused. Yet,
with that caveat in mind, I do otherwise recommend this clearly written,
accessible, and engaging book.)
Comprehensive reference: Tabachnick, B.G.
& Fidell, L.S. (2007) Using multivariate statistics (especially
Chapter 5: Multiple regression; Chapter 10: Logistic regression). Boston, MA:
Allyn and Bacon.
Other Readings (In course outline
below)
|
Course Outline |
|
|
Week 1 |
· Introduction and
course overview · Issues in
confirmation bias · Confirmation,
refutation, and corroboration · Prove versus
support or discredit · Discussion and
critique of students’ research interests and plans |
|
Week 2 |
· Prediction vs.
explanation · Review of the
nature and importance of theory · Brief review of
basic concepts in multiple regression analysis · Multicollinearity
vs. overlapping covariance · Discussion and
critique of students’ research interests and plans Readings 1. Phillips, D.C. (2000). The expanded social scientist’s bestiary: A guide to fabled threats
to, and defenses of, naturalistic social science. (Preface, and Chapter 6:
New philosophy of science). View 2.
Hughes, J.
N. (2000). The essential role of theory in the science of treating children:
Beyond empirically supported treatments. Journal of School Psychology, 38,
301–330. 3. Pedhazur,
E. J. (1997). Prediction and explanation (pp. 195-198, 211). In Multiple regression in behavioral
research: Prediction and explanation.
View 4. Pedhazur,
E. J. & Schmelkin, L. P. (1991). Multiple regression analysis (and cautions on variance partitioning)
(pp. 413-428). In Measurement, design,
and analysis: An integrated approach.
View |
|
Week 3 |
· Review of assumptions
in linear regression · Review of assumptions
in logistic regression · Group critique of
student selected published examples (readings TBD) · Discussion and
critique of students’ research plans Readings 1.
Tabachnick, B. G. & Fidell, L. S. (2007). Using multivariate statistics (5th
edition). New York: Allyn and Bacon. (Assumptions,
pp. xxx-xxx ) 2.
Lumley, T. (2002). The importance of the normality
assumption in large public health data sets. Annual Review of Public Health, 23:151-169. |
|
Week 4 |
· Five analytic
strategies in multiple regression ·
Victora’s generic hierarchical
conceptual framework ·
Critique and discussion of parent-adolescent
communication example · Discussion and critique of students’ research plans Readings 1.
Tabachnick, B. G. & Fidell, L. S. (2007). Using multivariate statistics (5th
edition). New York: Allyn and Bacon. (Major types of multiple regression,
pp. 136-144.) View 2.
Victora, et al. (1997). The role of conceptual frameworks in epidemiological analysis: A
hierarchical approach. International
Journal of Epidemiology, 26(1). View 3.
Phillips, D.C. (2000). Chapter 8 (Popperian rules
for research design). View |
|
Week 5 |
· Clues to the puzzle
of scientific evidence · Theories and laws · Group critique of
student selected published examples (readings TBD) · Discussion and critique of students’ research plans Readings 1.
Haack, S. (2003). Defending
science -- Within reason: Between scientism and cynicism. New York: Prometheus. Preface, and Chapter 3 (Clues to the
puzzle of scientific evidence: A more so
story). View 2.
Phillips, D.C. (2000). Chapter 12 (Theories and Laws).
View |
|
Week 6 |
· Rosenberg’s elaboration
model · Confounding and
suppression · Group critique of
student selected published examples (readings TBD) · Discussion and critique of students’ research plans Readings 1. Babbie, E. (2001), The practice
of social research (Chapter 16: The elaboration model). Belmont, CA:
Wadsworth. 2. Constantine, N. A. (2008). Simpson’s paradox. In S. Boslaugh (Ed.), Encyclopedia
of epidemiology: Vol. 1 (pp.
973-974). Thousand Oaks, CA: Sage Publishers. 3. Greenland, S. and Morgenstern, H. (2001). Confounding in health
research. Annual Review of Public Health, 22:189–212 |
|
Week 7 |
· Aneshensel’s elaboration
model: exclusionary/inclusive approach · Group critique of
student selected published examples (readings TBD) · Discussion and critique of students’ research designs and analyses Readings 1. Aneshensel (2002),
Preface 2. Aneshensel (2002),
Chapter 1: Introduction to Theory-Based Data Analysis 3. Aneshensel (2002),
Chapter 2: The Logic of Theory-Based Data Analysis |
|
Week 8 |
·
Associations and relationships · The focal
relationship · Group critique of
student selected published examples (readings TBD) · Discussion and critique of students’ research designs and analyses Readings 1. Aneshensel (2002): Chapter
3: Associations and Relationships 2. Aneshensel (2002):
Chapter 4: The Focal Relationship: Demonstrating Internal Validity |
|
Week 9 |
·
Ruling out alternative
explanations
·
Group critique of student
selected published examples (readings TBD) ·
Discussion
and critique of students’ research designs and analyses Readings 1. Donald Campbell on ruling out alternative explanations 2. Aneshensel (2002): Chapter
5: Ruling Out Alternative Explanations: Spurious and Control Variables 3. Aneshensel (2002):
Chapter 6: Ruling Out Alternative Explanations: Additional
Independent Variables |
|
·
Elaborating an explanation ·
Group critique of student
selected published examples (readings TBD) ·
Discussion
and critique of students’ research designs and analyses Readings 1.
Aneshensel (2002): Chapter 7: Elaborating an Explanation: Antecedent, Intervening, and Consequent
Variables 2. Baron, R.M. & Kenny, D.A.
(1986). The moderator-mediator variable distinction in social
psychological research: Conceptual, strategic, and statistical
considerations. Journal of Personality
and Social Psychology, 51, 1173-1182. 3. Frazier, P.A., Tix, A.P., & Barron, K.E. (2004). Testing moderator
and mediator effects in counseling psychology research. Journal of Counseling Psychology, 51, 115-134. View |
|
|
Week 11 |
·
Conditions of Influence ·
Group critique of student
selected published examples (readings TBD) · Discussion and critique of students’ research
designs
and analyses Readings 1. Aneshensel (2002): Chapter
8: Conditions of Influence: Effect Modification 2.
Frazier, P.A., Tix, A.P.,
& Barron, K.E. (2004). Testing moderator and mediator effects in
counseling psychology research. Journal
of Counseling Psychology, 51, 115-134. View |
|
Week 12 |
· Group critique of
student selected published examples (readings TBD) · Discussion and
critique of students’ research designs and analyses |
|
Week 13 |
· Review, contrast,
and critiques of the sequential and elaboration strategies · The lingering problem
of theory under-determination in both strategies · Group critique of
student selected published examples (readings TBD) · Discussion and
critique of students’ research designs
and analyses Readings 1.
Aneshensel (2002): Chapter 9: Synthesis and Comment |
|
Week 14 |
· Four student
presentations and class critique and discussion (10 minute presentation, 20
minute discussion each) |
|
Week 15 |
· Four student
presentations and class critique and discussion (10 minute presentation, 20
minute discussion each) |
Words of Wisdom
(instructor’s known biases)
· "You cannot fix
by analysis what you bungle in design." (Light, Singer, & Willet,
1990)
· “Occam’s razor
applies to methods as well as to theories.” (Wilkinson & the APA Task Force
on Statistical Inference, 1999)
· “Analytic techniques, no matter how fancy
they may be, cannot salvage a misspecified model.” (Pedhazur & Schmelkin, 1991)
· "With the data usually available for such studies, there is simply
no logical or statistical procedure that can be counted on to make proper allowances
for uncontrolled preexisting differences between groups. (Lord, 1967, quoted
in Pedhazur & Schmelkin, 1991)
· "One may well wonder what exactly it means to ask what the data
would look like were they not what they are." (Anderson, 1963)
· “A willingness to
entertain rival interpretations, an ability to place knowledge within broader
contexts, and an openness to new ways of conceptualizing problems are essential
to scientific inquiry. Theory serves these functions as well as directs
inquiry, unifies and systematizes knowledge, and makes sense of what (might)
otherwise be inscrutable empirical facts” (Hughes, 2000)
·
“The model does not ‘confirm’ causal
relationships. Rather it assumes causal links and then tests how strong they
would be if the model were a correct representation of reality.” (Shadish,
Cook, & Campbell 2002).
·
“Structural
equation modeling is more useful for rejecting false models than for somehow
proving whether a given model is in fact true.” (Kline, 1998).
· “More and more I have
come to the conclusion that the core of the scientific method is not
experimentation per se but rather the strategy connoted by the phrase ‘plausible rival hypotheses’." (Campbell,
1989)
Words
of Wisdom References
Campbell, D.T.
(1989). Foreword to R. K. Yin, (2003) Case study research design and methods
(3rd ed.). Thousand Oaks, CA: Sage. (first edition 1989).
Hughes, J. N. (2000).
The essential role of theory in the science of treating children: Beyond
empirically supported treatments. Journal of School Psychology, 38, 301–330.
Kline, R. B. (1998). Principles
and practice of structural equation modeling. NY: Guilford.
Light, R. J., Singer,
J. D., & Willett, J. B. (1990). By design: Planning research on higher
education. Cambridge, MA: Harvard University Press.
Pedhazur, E. J. &
Schmelkin, L. P. (1991). Measurement, design, and analysis: An integrated
approach. Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers. .
Shadish, W. R., Jr.,
Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental
designs for generalized causal inference. Boston: Houghton-Mifflin.
Wilkinson, L. &
the APA Task Force on Statistical Inference (1999). Statistical methods in psychology
journals: Guidelines and explanations.
American Psychologist, 54, 594-604