1. Given the following observations of quantitative variables X and Y:
x= 0, 1, 2, 3, 15
y= 3, 4, 6, 10, 0
a. Make a scatterplot of the data on the axes. Circle the most influential observation. (4 points)
(b) Determine the LSRL of Y on X. Draw
this line carefully on your scatterplot. (4 points)
(c) What is the definition of a regression outlier? (4
points)
(d) Which data point is the biggest regression outlier?
(4 points)
(e) What is the residual at the point identified in part (c)? (4
points)
(f) Construct a residual plot for these data. (4
points)
(g) Interpret your residual plot, i.e., what does the residual
plot tell you? (4 points)
2. Anthropologists must often estimate from human remains how tall the person was when alive. Carla is studying how overall height can be predicted from the length of leg bone in a group of 36 living males. The data show that the bone lengths have mean 45.9 cm and standard deviation 4.2 cm, the overall heights have mean 172.7 cm and standard deviation 8.14 cm, and the correlation between bone length and height is 0.914.
(a) What is the slope of the LSRL of
height on bone length? (6 points)
(b) About what percent of the observed variation in the heights of the men can be explained by the linear regression of height on bone length? (4 points)
(c) Based on your answer to (b), how would you describe the goodness of linear fit? (4 points)
(d) Determine the equation of the LSRL of height on bone length. (4 points)
3. Consider the following scatterplot:
Provide an approximate equation for the regression line that has been drawn on the plot. (6 points)
a) This the scatter plot of Y and X
To find the most influential observation we will fit a trendline through this data. A trend line is a line which covers all data points or passes near them. In this case we have trend line as
Since it has to pass near all the observation we have to draw it like this. However if did not have the last data point(15,0) our trendline would be different. Here is the scattter plot without last point
The most influential observation is the one whose deletion from the dataset noticeably changes the result. Hence our most influential observation is (15,0).
b) This line which passes through all observations is called LSRL of Y on X.
c) Data points that diverge in a big way from the overall pattern of the data set are called Regression outlier. An influential point is also an example of Regression outlier.
d) Data Point (15,0) is the biggest oulier because none of the other X points are near 15 and no other Y point is near 0. Thus both (15,0) deviate from the whole data set (X,Y)
Please give thumbs up to my answer...!!
Get Answers For Free
Most questions answered within 1 hours.