Due: November 11th, 3:00 PM PST-Statistics Assignment
1. For the population of low birth weight infants, a significant linear relationship was found to exist between systolic blood pressure and gestational age (the data set lowbwt1.csv, and lowbwt2.txt; these datasets both include variables for the same samples, in the same order, with different but overlapping columns). The measurements of systolic blood pressure are saved under the variable name sbp, and the corresponding gestational ages under gestage. Also contained in the data set is the variable name apgar5, the five-minute apgar score for each infant. (The apgar score is an indicator of a child’s general state of health five minutes after it is born; although it is actually an ordinal measurement, it is often treated as if it were continuous.) (11 marks)
(a) (2 marks) Create a scatter plot of systolic blood pressure versus five-minuteapgar score. Does there appear to be a linear relationship between these two variables?
(b) (2 marks) Using systolic blood pressure as the response and gestational ageand apgar score as the explanatory variables, fit the multiple linear regression
yˆ = a + βˆ1x1 + βˆ2x2.
Interpret βˆ1, the estimated coefficient of gestational age. What does it mean in words? Similarly, interpret βˆ2, the estimated coefficient of five-minute apgar score.
(c) (1 mark) What is the estimated mean systolic blood pressure for the populationof low birth weight infants whose gestational age is 31 weeks and whose fiveminute apgar score is 7?
(d) (2 marks) Test the null hypothesis:
H0 : β2 = 0
at the 0.05 level of significance. What do you conclude?
(e) (2 marks) Construct a plot of the residuals versus the fitted values of systolicblood pressure. What does this plot tell you about the fit of the model to the observed data?
2. The data set lowbwt also contains sex, a dichotomous random variable designating the birth sex assigned to each infant. (4 marks)
(a) (2 marks) Add the indicator variable sex (where 1 represents male and 0 female) to the model that contains gestational age. Given two infants with identical gestational ages, one assigned male and the other female, which would tend to have the higher systolic blood pressure? By how much, on average?
(b) (2 marks) Add to the model a third explanatory variable that is the interactionbetween gestational age and birth sex. Does gestational age have a different effect on systolic blood pressure depending on the birth sex of the infant?