Revised: February 1, 2010 

School of Public Health

University of California, Berkeley

Theory-Based Data Analysis

Group-Study Seminar

PB HLTH 298-64 (Group Study), Spring, 2010

 

 

Instructor:                  Norm Constantine

237 University Hall

(925) 284-8118

nconstantine@berkeley.edu

http://sph.berkeley.edu/faculty/constantine.html

Units:                          2 units (up to 4 units available by special arrangement)

When offered:           Selected  Tuesdays, 4:00 to 6:00 p.m.

(1/26, 2/2, 2/9, 2/23, 3/9, 3/16, 3/30, 4/13, 4/27, 5/4)

Location:                    590-L University Hall

CC#                            76679

Participants:              Developed for DrPH students who I am advising, other doctoral and advanced master’s students are welcome by special arrangement as space permits (max=8)

Syllabus at:               http://crahd.phi.org/PH298-64-Sp09Syllabus.htm

       

Words of Wisdom (aka, instructor’s known biases)

 

·          “The model does not ‘confirm’ causal relationships. Rather it assumes causal links and then tests how strong they would be if the model were a correct representation of reality.” (Shadish, Cook, & Campbell 2002).

·         “Structural equation modeling is more useful for rejecting false models than for somehow proving whether a given model is in fact true.” (Kline, 1998).

 

Course Description

 

This group-study seminar is intended to assist students in (1) developing their abilities to critical appraise the design and interpretation of multivariable analyses of non-experimental quantitative data, and (2) translating the theory guiding their own dissertation research into a coherent plan for analysis.

 

Five types of analytical strategies for multivariable analysis will be reviewed and critiqued. These include three atheoretical strategies – bivariate (unadjusted), simultaneous (standard), and automated (stepwise, perhaps better classified as an anti-theory strategy), as well as two theory-based strategies – sequential (hierarchical), and elaborative (exclusionary/inclusive). The focus of this seminar will be on these last two theory-based strategies, and on conceptual design and causal inference issues, rather than statistical analysis issues.

 

The group will meet for nine two-hour sessions. Each session will include (1) discussion of concepts and issues from the current week’s readings, and (2) discussion and critique of students’ own research and/or student presented published examples from their own public health specialty areas. Both elements are priorities of this course, and as much as possible, the two will be integrated.

 

The etiology of this course is similar to that described by Carol Aneshensel in her preface to our primary text. Hopefully we can build on her and her students experiences and insights to achieve some of the same successes:

 

“This text was conceived at a student's preliminary doctoral exam, and its development has been a response to questions posed by students in the graduate course that eventually grew from this brainchild. I was struck, not for the first time, by a technically correct description of a series of statistical techniques that had little relevance to the theory to be tested in the proposed research. Lest students think they have been unfairly singled out, I should men­tion that this disconnection is also found in the proposals of more seasoned investigators. I was puzzled because the students who struggled most with their data analysis were uniformly bright, articulate in their understanding of theory, and, most perplexingly, well trained in the techniques of multivariate statistics. This pat­tern suggested that something was missing from the wav we were training the next generation of social scientists.  

 

“With considerable chutzpah, I started a seminar on multivari­ate data analysis to rectify this problem. I soon discovered that I had no idea how to teach someone else how to translate his or her theory into a coherent plan for data analysis. Fortunately students in those first few years did not seem to realize this chaotic state of confusion and found that the seminar enabled them to integrate what they had learned over several years in other courses. I was pleased, of course, and more than ready to accept the credit for their astute insights. In retrospect, it is clear students were able to bring their ideas and their analyses closer together because they presented their analyses to their classmates, responded to their critiques, and offered the same in exchange. This book is the result of eavesdropping on their conversations.”

-- Aneshensel (2002)

 

 

 

Prerequisites

 

This is a conceptual applications seminar and is not focused on statistical mechanics. Nevertheless, to benefit from the seminar it is necessary to have sufficient understanding of the basic concepts of multiple linear regression, such as variance and covariance, R2, b and Beta coefficients, partial and semi-partial correlation, statistical control, regression assumptions, etc. Prior completion or concurrent enrollment in PH145 Statistical Analysis of Continuous Outcome Data, PH241 Statistical Analysis of Categorical Data, or another 200 level course that sufficiently covers the fundamentals of multiple linear and/or logistic regression is expected. Alternatively, otherwise qualified and highly motivated students without this background can request by special agreement to minimally meet this perquisite through an additional credit of independent study using Multiple Regression: A Primer (Allison, 1999). This would need to be completed and verified by the end of February. It is also expected that students have developed dissertation or other research questions and at least begun work on theoretical frameworks to be able to apply these to the exercises of this class.

 

Learning Objectives  

 

After successful completion of this course, students will be able to:

 

1. Understand the fundamental differences between predictive and explanatory research;

2. Explain and illustrate the differences between the statistical problem of multicollinearity and interpretative problem of overlapping covariance;

3. Distinguish between different types of multivariable analytic strategies, and explain the advantages, disadvantages, and appropriate applications for each;

4. Critical appraise the design and interpretation of published multivariable analyses in the student’s specialty area of public health; and

5. Develop a coherent theory-based design for multivariable analysis in the student’s specialty area of public health.

 

Student Responsibilities

 

1. Thoroughly study all assigned readings prior to class, and actively and appropriately participate in all discussions;

2. Identify, present, and critically discuss published examples from students’ specialty area of public health, and contribute to the discussion of examples from other students’ areas; 

3. Develop a theory-based multivariable analysis design based on an appropriate application of a theory-based strategy, for a research question(s)from the student’s own specialty area; and

4. Optionally, implement the above theory-based multivariable analysis design using a relevant data set and appropriate statistical software.

 

Required Texts

 

1. Allison, P. D. (1999). Multiple regression: A primer. Thousand Oaks, CA: Sage, Pine Forge Press. (This is an excellent review of the mechanics and assumptions of multiple regression analysis, from the same series as the Aneshensel book. However Allison’s approach is not theory based, and in some places it is downright theoretically confused. Yet, with that caveat in mind, I do highly recommend this clearly written, accessible, and engaging book.)

 

2. Aneshensel, C.S. (2002). Theory-based data analysis for the social sciences. Thousand Oaks, CA: Sage, Pine Forge Press.

 

 

Other Readings (In course outline below)

 

 

Course Outline

February 2

·         Introduction and course overview

·         Prediction vs. explanation

·         Review of the nature and importance of  theory

·         The continuum of small-t to big-T theories

·         Dust bowl empiricism in public health research

·         Confirmation, refutation, and corroboration

·         Brief review of a few basic concepts in multiple regression analysis

·         Multicollinearity vs. overlapping covariance

 

Required Readings (very brief, about 5 pages)

  1. Pedhazur, E. J.  (1997). Prediction and explanation (pp. 195-198, 211). In Multiple regression in behavioral research: Prediction and explanation.  View

 

Supplemental Readings

  1. Allison, P. D. (1999). Multiple Regression: A Primer. Sage: Pine Forge Press.

 

  1. Cohen, J. (1968). Multiple regression as a general data analytic system. Psychological Bulletin, 70, 426-443.

 

  1. Pedhazur, E. J. & Schmelkin, L. P.  (1991). Multiple regression analysis (and cautions on variance partitioning) (pp. 413-428). In Measurement, design, and analysis: An integrated approach.  View

February 9

·         Five analytic strategies in multiple regression

·         Role of generic conceptual frameworks

·         Example from parent-adolescent communication

 

Required Readings

  1. Phillips, D.C. (2000). Preface, and Chapter 6 (New philosophy of science). View

 

  1. Tabachnick, B. G. & Fidell, L. S. (2007). Using multivariate statistics (5th edition). New York: Allyn and Bacon.  (Major types of multiple regression, pp. 136-144.) View

 

  1. Victora, et al. (1997). The role of conceptual frameworks in epidemiological analysis: A hierarchical approach. International Journal of Epidemiology, 26(1). View

 

Supplemental Readings

  1. Phillips, D.C. (2000). Chapter 8 (Popperian rules for research design). View 

 

  1. Phillips, D.C. (2000). Chapter 12 (Theories and Laws). View

 

  1. Haack, S. (2003). Defending science -- Within reason: Between scientism and cynicism. New York: Prometheus.  Preface, and Chapter 3 (Clues to the puzzle of scientific evidence: A more so  story). View 

February 23

·         Critical analysis of student-selected examples

 

Required Readings (examples tbd)

 

Supplemental Readings

  1. Phillips, D.C. (2000). Chapter 8 (Popperian rules for research design). View 

 

  1. Phillips, D.C. (2000). Chapter 12 (Theories and Laws). View

 

  1. Haack, S. (2003). Defending science -- Within reason: Between scientism and cynicism. New York: Prometheus.  Preface, and Chapter 3 (Clues to the puzzle of scientific evidence: A more so  story). View 

 

 

March 9

 

·         Rosenberg’s elaboration model approach

·         Aneshensel’s exclusionary/inclusive approach

 

Required Readings

  1. Aneshensel (2002), Preface

 

  1. Aneshensel (2002), Chapter 1: Introduction to Theory-Based Data Analysis

 

  1. Aneshensel (2002), Chapter 2: The Logic of Theory-Based Data Analysis

March 16

 

·         Associations and relationships

·         The focal relationship

 

Required Readings

  1. Aneshensel (2002): Chapter 3: Associations and Relationships

 

  1. Aneshensel (2002): Chapter 4: The Focal Relationship: Demonstrating Internal Validity

March 30

·         Ruling out alternative explanations

 

Required Readings

  1. Aneshensel (2002): Chapter 5: Ruling Out  Alternative Explanations:   Spurious and Control Variables

 

  1. Aneshensel (2002): Chapter 6: Ruling Out  Alternative Explanations: Additional Independent Variables

April 13

·         Elaborating an explanation

 

Required Readings

1.    Aneshensel (2002): Chapter 7: Elaborating an Explanation: Antecedent, Intervening, and Consequent Variables

 

2.    Frazier, P.A., Tix, A.P., & Barron, K.E. (2004). Testing moderator and mediator effects in counseling psychology research. Journal of Counseling Psychology, 51, 115-134.

April 27

·         Conditions of  Influence

 

Required Readings

  1. Aneshensel (2002): Chapter 8: Conditions of  Influence: Effect Modification

 

  1. Aneshensel (2002): Chapter 9: Synthesis and Comment

May 4

·         Critiques of the sequential and elaboration strategies

·         The lingering problem of theory under-determination in both strategies

·         Theory-based celebration (Faculty Club)

 

 

Words of Wisdom (continued, full versions and references)

 

Wilkinson & the APA Task Force on Statistical Inference, 1999:

 

“The enormous variety of modem quantitative methods leaves researchers with the nontrivial task of matching analysis and design to the research question. Although complex designs and state-of-the-art methods are sometimes necessary to address research questions effectively, simpler classical approaches often can provide elegant and sufficient answers to important questions. Do not choose an analytic method to impress your readers or to deflect criticism. If the assumptions and strength of a simpler method are reasonable for your data and research problem, use it. Occam's razor applies to methods as well as to theories.” 

 

Fred Lord, 1967, p. 305; quoted in Pedhazur & Schmelkin, 1991:

 

"With the data usually available for such studies, there is simply no logical or statistical procedure that can be counted on to make proper allowances for uncontrolled preexisting differences between groups. The researcher wants to know how the groups would have compared if there had not been preexisting uncontrolled differences. The usual research study of this type is attempting to answer a question that simply cannot be answered in any rigorous way on the basis of the available data."

 

Donald Campbell, 1989:

 

“More and more I have come to the conclusion that the core of the scien­tific method is not experimentation per se but rather the strategy connoted by the phrase ‘plausible rival hypotheses.’ This strategy may start its puz­zle solving with evidence, or it may start with hypothesis. Rather than pre­senting this hypothesis or evidence in the context-independent manner of positivistic confirmation (or even of postpositivistic corroboration), it is presented instead in extended networks of implications that (although never complete) are nonetheless crucial to its scientific evaluation.

 

“This strategy includes making explicit other implications of the hypoth­esis for other available data and reporting how these fit. It also includes seeking out rival explanations of the focal evidence and examining their plausibility. The plausibility of these rivals is usually reduced by ramifica­tion extinction, that is, by looking at their other implications on other data sets and seeing how well these fit. How far these two potentially endless tasks are carried depends on the scientific community of the time and what implications and plausible rival hypotheses have been made explicit. It is on such bases that successful scientific communities achieve effective con­sensus and cumulative achievements, without ever reaching foundational proof. Yet, these characteristics of the successful sciences were grossly neglected by the logical positivists and are underpracticed by the social sciences, quantitative or qualitative.”

 

References

 

Campbell, D.T. (1989). Foreword to R. K. Yin, (2003) Case study research design and methods (3rd ed.). Thousand Oaks, CA: Sage. (first edition 1989).

Hughes, J. N. (2000). The essential role of theory in the science of treating children: Beyond empirically supported treatments. Journal of School Psychology, 38, 301–330.

Kline, R. B. (1998). Principles and practice of structural equation modeling. NY: Guilford.

 

Light, R. J., Singer, J. D., & Willett, J. B. (1990). By design: Planning research on higher education. Cambridge, MA: Harvard University Press.

Pedhazur, E. J. & Schmelkin, L. P. (1991). Measurement, design, and analysis: An integrated approach. Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers. .

Shadish, W. R., Jr., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston: Houghton-Mifflin.

 

Wilkinson, L. & the APA Task Force on Statistical Inference (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594-604