School of Public Health
University of California, Berkeley
Theory-Based
Data Analysis
Group-Study
Seminar
PB HLTH 298-64 (Group
Study), Spring, 2010
Instructor: Norm
Constantine
237 University Hall
http://sph.berkeley.edu/faculty/constantine.html
Units: 2
units (up to 4 units available by special arrangement)
When
offered: Selected Tuesdays, 4:00 to 6:00 p.m.
(1/26, 2/2, 2/9, 2/23, 3/9, 3/16, 3/30,
4/13, 4/27, 5/4)
Location: 590-L University Hall
CC# 76679
Participants:
Developed for DrPH students who I am advising, other doctoral and advanced master’s
students are welcome by special arrangement as space permits (max=8)
Syllabus
at: http://crahd.phi.org/PH298-64-Sp09Syllabus.htm
Words of Wisdom (aka,
instructor’s known biases)
·
“The model does not ‘confirm’ causal
relationships. Rather it assumes causal links and then tests how strong they
would be if the model were a correct representation of reality.” (Shadish,
Cook, & Campbell 2002).
·
“Structural
equation modeling is more useful for rejecting false models than for somehow
proving whether a given model is in fact true.” (Kline, 1998).
Course Description
This
group-study seminar is intended to assist students in (1) developing their
abilities to critical appraise the design and interpretation of multivariable
analyses of non-experimental quantitative data, and (2) translating the theory
guiding their own dissertation research into a coherent plan for analysis.
Five
types of analytical strategies for multivariable analysis will be reviewed and
critiqued. These include three atheoretical strategies – bivariate
(unadjusted), simultaneous (standard), and automated (stepwise, perhaps better
classified as an anti-theory strategy), as well as two theory-based strategies
– sequential (hierarchical), and elaborative (exclusionary/inclusive). The
focus of this seminar will be on these last two theory-based strategies, and on
conceptual design and causal inference issues, rather than statistical analysis
issues.
The
group will meet for nine two-hour sessions. Each session will include (1) discussion
of concepts and issues from the current week’s readings, and (2) discussion and
critique of students’ own research and/or student presented published examples
from their own public health specialty areas. Both elements are priorities of
this course, and as much as possible, the two will be integrated.
The
etiology of this course is similar to that described by Carol Aneshensel in her
preface to our primary text. Hopefully we can build on her and her students
experiences and insights to achieve some of the same successes:
“This text was conceived at a student's
preliminary doctoral exam, and its development has been a response to questions
posed by students in the graduate course that eventually grew from this
brainchild. I was struck, not for the first time, by a technically correct
description of a series of statistical techniques that had little relevance to
the theory to be tested in the proposed research. Lest students think they have
been unfairly singled out, I should mention that this disconnection is also
found in the proposals of more seasoned investigators. I was puzzled because
the students who struggled most with their data analysis were uniformly bright,
articulate in their understanding of theory, and, most perplexingly, well
trained in the techniques of multivariate statistics. This pattern suggested
that something was missing from the wav we were training the next generation of
social scientists.
“With considerable chutzpah, I started a
seminar on multivariate data analysis to rectify this problem. I soon
discovered that I had no idea how to teach someone else how to translate his or
her theory into a coherent plan for data analysis. Fortunately students in
those first few years did not seem to realize this chaotic state of confusion
and found that the seminar enabled them to integrate what they had learned over
several years in other courses. I was pleased, of course, and more than ready
to accept the credit for their astute insights. In retrospect, it is clear
students were able to bring their ideas and their analyses closer together
because they presented their analyses to their classmates, responded to their
critiques, and offered the same in exchange. This book is the result of
eavesdropping on their conversations.”
-- Aneshensel (2002)
Prerequisites
This is a conceptual applications seminar and is not
focused on statistical mechanics. Nevertheless, to benefit from the seminar it
is necessary to have sufficient understanding of the basic concepts of multiple
linear regression, such as variance and covariance, R2, b and Beta
coefficients, partial and semi-partial correlation, statistical control, regression
assumptions, etc. Prior
completion or concurrent enrollment in PH145 Statistical Analysis of Continuous Outcome Data, PH241 Statistical Analysis of Categorical Data,
or another 200 level course that sufficiently covers the fundamentals of
multiple linear and/or logistic regression is expected. Alternatively, otherwise
qualified and highly motivated students without this background can request by
special agreement to minimally meet this perquisite through an additional
credit of independent study using Multiple Regression: A Primer (Allison, 1999). This would need to be
completed and verified by the end of February. It is also expected that
students have developed dissertation or other research questions and at least
begun work on theoretical frameworks to be able to apply these to the exercises
of this class.
Learning Objectives
After
successful completion of this course, students will be able to:
1. Understand the
fundamental differences between predictive and explanatory research;
2. Explain and
illustrate the differences between the statistical problem of multicollinearity
and interpretative problem of overlapping covariance;
3. Distinguish between
different types of multivariable analytic strategies, and explain the
advantages, disadvantages, and appropriate applications for each;
4. Critical appraise the
design and interpretation of published multivariable analyses in the student’s
specialty area of public health; and
5. Develop a coherent theory-based
design for multivariable analysis in the student’s specialty area of public
health.
Student
Responsibilities
1. Thoroughly study all
assigned readings prior to class, and actively and appropriately participate in
all discussions;
2. Identify, present,
and critically discuss published examples from students’ specialty area of
public health, and contribute to the discussion of examples from other
students’ areas;
3. Develop a theory-based
multivariable analysis design based on an appropriate application of a
theory-based strategy, for a research question(s)from the student’s own specialty
area; and
4.
Optionally, implement the above
theory-based multivariable analysis design using a relevant data set and
appropriate statistical software.
Required
Texts
1. Allison, P. D. (1999). Multiple regression: A primer. Thousand Oaks, CA: Sage, Pine
Forge Press. (This is an excellent review of the mechanics and assumptions of
multiple regression analysis, from the same series as the Aneshensel book.
However Allison’s approach is not theory based, and in some places it is
downright theoretically confused. Yet, with that caveat in mind, I do highly
recommend this clearly written, accessible, and engaging book.)
2. Aneshensel, C.S. (2002). Theory-based data analysis for the social sciences. Thousand
Oaks, CA: Sage, Pine Forge Press.
Other Readings (In course outline
below)
|
Course Outline |
|
|
February 2 |
·
Introduction and course overview ·
Prediction vs. explanation ·
Review of the nature and importance of theory ·
The continuum of small-t to big-T theories ·
Dust bowl empiricism in public health research ·
Confirmation, refutation, and corroboration ·
Brief review of a few basic concepts in multiple
regression analysis ·
Multicollinearity vs. overlapping covariance Required
Readings
(very brief, about 5 pages)
Supplemental
Readings
|
|
February 9 |
·
Five analytic strategies in multiple regression ·
Role of generic conceptual
frameworks ·
Example from
parent-adolescent communication Required Readings
Supplemental
Readings
|
|
February 23 |
·
Critical analysis of student-selected examples Required
Readings
(examples tbd) Supplemental
Readings
|
|
March 9 |
·
Rosenberg’s elaboration model approach ·
Aneshensel’s exclusionary/inclusive approach Required
Readings
|
|
March 16 |
·
Associations and relationships ·
The focal relationship Required Readings
|
|
March 30 |
·
Ruling out alternative
explanations Required Readings
|
|
·
Elaborating an explanation Required Readings 1. Aneshensel (2002):
Chapter 7: Elaborating an Explanation:
Antecedent, Intervening, and Consequent Variables 2. Frazier, P.A., Tix, A.P., & Barron, K.E. (2004). Testing moderator
and mediator effects in counseling psychology research. Journal of Counseling Psychology, 51, 115-134. |
|
|
April 27 |
·
Conditions of Influence Required
Readings
|
|
May 4 |
·
Critiques of the sequential and elaboration strategies ·
The lingering problem of theory under-determination in
both strategies ·
Theory-based celebration (Faculty Club) |
Words of Wisdom (continued,
full versions and references)
Wilkinson
& the APA Task Force on Statistical Inference, 1999:
“The
enormous variety of modem quantitative methods leaves researchers with the
nontrivial task of matching analysis and design to the research question.
Although complex designs and state-of-the-art methods are sometimes necessary
to address research questions effectively, simpler classical approaches often can
provide elegant and sufficient answers to important questions. Do not choose an
analytic method to impress your readers or to deflect criticism. If the
assumptions and strength of a simpler method are reasonable for your data and
research problem, use it. Occam's razor applies to methods as well as to
theories.”
Fred
Lord, 1967, p. 305; quoted in Pedhazur & Schmelkin, 1991:
"With the data usually available for such
studies, there is simply no logical or statistical procedure that can be
counted on to make proper allowances for uncontrolled preexisting differences
between groups. The researcher wants to know how the groups would have compared
if there had not been preexisting uncontrolled differences. The usual research
study of this type is attempting to answer a question that simply cannot be
answered in any rigorous way on the basis of the available data."
Donald
Campbell, 1989:
“More
and more I have come to the conclusion that the core of the scientific method
is not experimentation per se but rather the strategy connoted by the phrase
‘plausible rival hypotheses.’ This strategy may start its puzzle solving with
evidence, or it may start with hypothesis. Rather than presenting this
hypothesis or evidence in the context-independent manner of positivistic
confirmation (or even of postpositivistic corroboration), it is presented
instead in extended networks of implications that (although never complete) are
nonetheless crucial to its scientific evaluation.
“This
strategy includes making explicit other implications of the hypothesis for
other available data and reporting how these fit. It also includes seeking out
rival explanations of the focal evidence and examining their plausibility. The
plausibility of these rivals is usually reduced by ramification extinction,
that is, by looking at their other implications on other data sets and seeing
how well these fit. How far these two potentially endless tasks are carried
depends on the scientific community of the time and what implications and
plausible rival hypotheses have been made explicit. It is on such bases that
successful scientific communities achieve effective consensus and cumulative
achievements, without ever reaching foundational proof. Yet, these
characteristics of the successful sciences were grossly neglected by the
logical positivists and are underpracticed by the social sciences, quantitative
or qualitative.”
References
Campbell, D.T.
(1989). Foreword to R. K. Yin, (2003) Case study research design and methods
(3rd ed.). Thousand Oaks, CA: Sage. (first edition 1989).
Hughes, J. N. (2000).
The essential role of theory in the science of treating children: Beyond
empirically supported treatments. Journal of School Psychology, 38, 301–330.
Kline, R. B. (1998). Principles
and practice of structural equation modeling. NY: Guilford.
Light, R. J., Singer,
J. D., & Willett, J. B. (1990). By design: Planning research on higher
education. Cambridge, MA: Harvard University Press.
Pedhazur, E. J. &
Schmelkin, L. P. (1991). Measurement, design, and analysis: An integrated
approach. Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers. .
Shadish, W. R., Jr.,
Cook, T. D., & Campbell, D. T. (2002). Experimental and
quasi-experimental designs for generalized causal inference. Boston:
Houghton-Mifflin.
Wilkinson, L. &
the APA Task Force on Statistical Inference (1999). Statistical methods in psychology
journals: Guidelines and explanations.
American Psychologist, 54, 594-604