Term Project
Sociology/CAPP/HESB 303
Spring 1999
Prof. Dan Myers
Your term project is designed to test you both your ability to put the statistics you have learned this semester to practical use and to test your mastery of STATA. In this project you will use real data to examine real relationships among variables.
Your completed project consists of two parts. The first is the log file from the final STATA run. As you examine the data, you will likely run exploratory analyses and make mistakes along the way. I do not want to see all of that. Once you have the final thing worked out, start a new log file, rerun your analyses, and print that log to attach to your write-up. I will not specifically be grading your computer output, but it will demonstrate to me how you have done the work and help me to trace down any problems.
The second part of your project is your research report. This report is a formal write-up of your research problem, your hypotheses, and the evidence either supporting or contradicting them. Specifically, your report should contain the following sections:
Introduction: Describe the general ideas about the research problems and cite any relevant literature/past work on the topic. Why is it interesting/important?
Hypotheses and Data: Lay out the specific research hypotheses you are going to test, briefly describe your data source, and tell about the specific variables used in your test.
Results: Give the evidence in the data for or against your hypotheses. This section should include descriptive information about your variables, graphs, and the results of statistical tests. The results section should make up the bulk of your report.
Discussion: Discuss the implications of your findings. Do they suggest other hypotheses that should be tested? Why do you think things turned out the way it did? Was the data in any way inadequate for the test? How confident are you in your conclusions and why?
Your research report should be about 10 pages in length, not including the computer output. Grading criteria includes selecting the correct tests for the variables you use, clarity of the presentation of the data and results, correct identification of the research and null hypotheses, correct identification of the independent and dependent variables, and correct interpretation of your statistical tests. **It makes no difference to your grade whether on not your research hypothesis is supported.**
To conduct the project:
There are three STATA data sets you may choose from to conduct your analysis. The first is the General Social Survey for 1994, the second is the General Social Survey from 87-91, and the third is a juvenile delinquency survey. Choose one and do your entire analysis using that data set. The data sets are gss94.dta, gss8791.dta, and juvdel.dta respectively and are stored in my public directory.
When you do your analysis, YOU CANNOT USE STATA QUEST! The real data sets are too large, therefore, you will have to use regular STATA. This means you will have to work in a campus computer cluster. To use STATA for this project, simply hit the START button, go to programs, basic, statistic and math, and choose STATA 5.0. As soon as it starts up, type in the command "quest" and the usual menus will appear. To open the data files, you can simply hit open, then use the drive window to navigate to the K: drive, then to nd.edu, then user19, then dmyers, then Public, then choose the name of the data file you want to use.
When you pull up the data files, you will see that they have a lot of variables. You will obviously need to know what these variables are and what the different response codes mean for each variable. For example, abany is a variable in the gss94.dta file, and has the possible responses 1, 2, 8, 9, and 0. You will need to know what all of these things mean in order to proceed. To do this you will need to look at the codebook for the data file you are examining.
These codebooks are on the web page for you to view or download (and are also in the public directory). For both gss94.dta and gss8791.dta, the codebook is gsscode.txt. For juvdel.dta the codebook is juvcode.txt.
The first thing you should do before starting your analysis is to examine the codebooks. In fact, you can write up the first two sections of your paper without ever invoking STATA. Once you have your variables picked and you hypotheses laid out, you should determine the levels of measurement for each of your variables and then select the appropriate tests. It is at this point that you should start computing. Run your tests, record your results, and finish writing up the paper.
For your paper, you should examine two sets of related variables for a total of 4 variables. For example, you could examine the relationship between the respondent's income and their opinion about abortion-on-demand. The second, related analysis might be something like their race and their opinion about abortion in the case of rape. Try to have some variables with different levels of measurement so you can use more than one statistical test.