Soc 593 Statistics II
Spring 2001
Assignment 5 : Regression Diagnostics
Due: Wednesday, Feb 28, 2001

Part One Theoretical Issues on Measurement Error and Model Mis-specification

1. Explain what bias and efficiency mean for an estimator. (Reference: Allison, Chap. 6; Berry/Feldman, Chap. 1.)

2. Which of the following statements about random measurement error is false? Briefly state why it is false.

A. Random measurement error in a variable increases the standard deviation of that variable.
B. Random measurement error in an independent variable produces downward (toward zero) bias in the slope coefficient of that independent variable.
C. Random measurement error drags the estimate of the Pearson Correlation coefficient (r) away from zero, which is usually referred to as an "upward bias" in the correlation coefficient.
D. Random measurement error reduces the t-score of the slope coefficient as well as the R-squared, thus making the model less efficient.
3. SHORT ANSWER: What are the three kinds of model mis-specification?  What are the consequences of each type of model mis-specification -- i.e. does each type of model mis-specification compromise the bias or efficiency of the estimator (coefficients)?  How?
 
Part Two Regression Diagnostics: Model Specification, Nonlinear Relationship, and Heteroskedasticity
Data: K:\nd.edu\user22\yli\Public\593sp01\Data\Hamilton\nations.dta

Task: The data you have available contains information on 109 countries in the world collected for the year 1985. You are interested in examining the predictors of average life expectancy in these countries (LIFE, life expectancy at birth).  Existing research suggests that:

Yhat(life) = a + b1 X1 + b2 X2 + b3 X3 + b4 X4
where X1 is BIRTH (crude birth rate, i.e., number of births per 1000 people), X2 is INFMORT (infant mortality, i.e., number of infant deaths per 1000 live births), X3 is URBAN (percent of population in urban areas), and X4 is FOOD (per capita daily consumption of calories). Having learned a plethora of regression diagnostic techniques, you decide to diagnose the model. In doing so, you will try to answer the following questions:
1. Is the model correctly specified? i.e., are there any extraneous variables that shouldn't be in the model?  Are any important variables omitted? e.g., is average education level a significant predictor of life expectancy?

2. Are the relationships linear?  Do scatterplots of Y on each X show a linear relationship between the variables?

3. Are there any interaction effects between the independent variables?

4. Is any of the OLS assumptions about the error term violated? i.e., is there problem of heteroskedasticity? Skewness? Kurtosis?

Use proper procedures and tests in Stata to diagnose the model and answer the questions. Hand in relevant Stata results and graphs with your typed discussion.