Richard Williams, Notre Dame Sociology

Sociology 73994

Categorical Data Analysis

Richard Williams, Instructor

Fall 2008


NOTE:  My Stata Highlights page includes links to Stata and statistical handouts from my other courses that may interest readers.

This page is under development.  Links will become "live" when they are ready.  Click here if you want to see the online notes and handouts from the last time the course was taught.  Some of these will be updated this semester but the old notes should be fine for anyone who wants to get a head start on methods we haven't gotten to yet.

Stata is in the labs.  You can also order your own personal copy of Stata through the GradPlan package.  I recommend the Stata/IC 10 & Getting Started Manual for $155.  Cheaper and more expensive packages are also available.  Stata 10 is now out but if you have Stata 9 that is fine.

 

NOTE: The following special types of files are used on this web page. Some materials are available only to nd.edu users.

PDF  Pdf files. Require Adobe Acrobat.  Get Acrobat Reader

SPSS  SPSSWIN files.

  Stata 9 files.

Useful sites for learning about Stata and SPSS

Rich Williams' Stata Highlights Page

UCLA's Statistical Computing Resources 
RW Suggestions for Using Stata at Notre Dame 

UCLA's Stata Starter Kit

RW's Suggested downloads

UCLA's SPSS Starter Kit
Resources for learning Stata UCLA - How does Stata compare with SAS and SPSS?
The Stata User Support Page Ben Jann's estout/esttab support page (esttab & estout are great for formatting output from Stata)

Overview.  This course discusses methods and models for the analysis of categorical dependent variables and their applications in social science research. Researchers are often interested in the determinants of categorical outcomes. For example, such outcomes might be binary (lives/dies), ordinal (very likely/ somewhat likely/ not likely), nominal (taking the bus, car, or train to work) or count (the number of times something has happened, such as the number of articles written). When dependent variables are categorical rather than continuous, conventional OLS regression techniques are not appropriate. This course therefore discusses the wide array of methods that are available for examining categorical outcomes.

Syllabus

Book Review of Regression Models for Categorical Dependent Variables Using Stata, Second Edition, by Long and FreeseThis will provide an overview of the text we are using.

Long (1997) Stata Files

Long and Freese (2006) Stata Files

RW Stata Files

Recommended Reading (ND.Edu Netid is required for access)

Overview of Generalized Linear Models, Maximum Likelihood Estimation

    Introduction to Generalized Linear Models

    Maximum Likelihood Estimation

    exlogistic documentation.  The MLE handout describes problems you can have when samples are small and/or you have one-way causation, such as the case where all females are observed to have a positive outcome.  If, alas, you happen to have such a sample, the new Stata 10 command exlogistic is for you.  Just skim through the documentation so you get the idea.

Brief Review of Models for Continuous Outcomes

PDF  Review of Multiple Regression (NOTE: I won't talk about this directly in class.  Instead I'll show you how to do things in Stata and ask you questions as we go along.)

reg01.dta - Data file used in the Stata Regression handout

PDF  Using Stata for OLS Regression  (If you are interested, click here for a similar handout using SPSS)

Models for Binomial Outcomes

The following 4 handouts are "repeats" from Soc 639993 (Grad Stats II), and even if you didn't have Stats II with me you may have had similar material in other classesRather than go through these in detail, I want you to prepare answers to these discussion questions before class.  We'll spend added time as necessary on any problem areas.

    Logistic Regression I: Problems with the Linear Probability Model (LPM)

    Logistic Regression II: The Logistic Regression Model (LRM)

    Logistic Regression III: Hypothesis Testing, Comparisons with OLS

    PDF  Using Stata 9 for Logistic Regression 

 

    PDF  Student presentations on binomial outcomes.  I'm going to have you do short presentations on material that I haven't covered before in Stats II.  Dates are tentative.

Supplementary Notes - Much of this material will be covered by you in your class presentations.  I therefore will skip over much of this in class but you should go over it and ask questions if you don't understand it.

    The Latent Variable Model In Binary Regressions

    Pseudo R^2, AIC, BIC

    Prelude to Discussion of Standardized Coefficients

    Standardized Coefficients (Don't read until we've gone over the prelude handout)

    Some Comments (and Warnings) about the adjust, prvalue and prtab commands

    Marginal Effects and Discrete Changes

    PDF  Alternatives to logistic regression 

Models for Ordinal Outcomes I: The ordered logit model

    Ordinal Regression I: Overview

    In-Class Problems on Ordinal Regression

        Ordinal Regression II: Hypothesis Testing & Interpreting Results (Don't read this until AFTER we have gone over the in-class problems+

Models for Ordinal Outcomes II: Heterogeneous Choice Models/ Group Comparisons

    Estimating Heterogeneous Choice Models with Stata (Complete paper).  Here is an earlier powerpoint version with handout that covers many (but not all) of the same points.

    Using Heterogeneous Choice Models To Compare Logit and Probit Coefficients Across Groups, Part I   

    Using Heterogeneous Choice Models To Compare Logit and Probit Coefficients Across Groups, Part II

    Using Heterogeneous Choice Models To Compare Logit and Probit Coefficients Across Groups (Complete paper; recommended.  The handouts above include most of the actual Stata code but don't reflect all of the latest revisions.)

Models for Ordinal Outcomes III: Generalized ordered logit models

    Generalized Ordered Logit Models 1: Overview; Using the gologit2 program

    Generalized Ordered Logit Models 1 - Accompanying Handout

    Generalized Ordered Logit Models 2: Interpreting results

    Generalized Ordered Logit Models 2 - Accompanying Handout

Models for Ordinal Outcomes IV: Interval Regression

    Interval Regression

        Supplemental Notes on the intreg Command

        intreg - hypothetical example

Categorical Data Analysis with Complicated Survey Designs   

             Introduction to Survey Data Analysis 

            UCLA's (see lower third of page) and StataCorp's FAQS on Survey Data Analysis (Optional; you may want to refer to these if you use the SVY commands)

Models for Multinomial Outcomes

    Multinomial Logit - Overview

    Post-Estimation Commands for mlogit

Models for Count Outcomes

    Count Outcomes, Part I

    Count Outcomes, Part II