Question

Avg Outside Temp | Attic Insulation (inches) | Age of Furnace (years) | Square Footage | Avg Heating Cost |

29 | 5 | 4 | 1900 | 198 |

8 | 6 | 7 | 2800 | 355 |

6 | 10 | 9 | 2500 | 291 |

22 | 8 | 11 | 2000 | 230 |

55 | 2 | 4 | 1300 | 121 |

36 | 2 | 5 | 2100 | 250 |

28 | 4 | 9 | 2400 | 360 |

36 | 7 | 2 | 2300 | 164 |

59 | 5 | 9 | 1300 | 42 |

64 | 5 | 6 | 1500 | 90 |

19 | 4 | 8 | 2300 | 271 |

57 | 5 | 3 | 1400 | 96 |

39 | 7 | 11 | 1900 | 187 |

25 | 9 | 8 | 2100 | 235 |

28 | 6 | 4 | 1800 | 138 |

53 | 11 | 2 | 1200 | 71 |

47 | 5 | 2 | 2000 | 206 |

20 | 4 | 14 | 2900 | 398 |

39 | 4 | 6 | 2600 | 319 |

60 | 8 | 6 | 1500 | 72 |

*M2_A2.* Western Home Inspections is a home inspection
service that provides prospective homebuyers with a thorough
assessment of the major systems in a house prior to the execution
of the purchase contract. Prospective homebuyers often ask the
company for an estimate of the average monthly heating cost of the
home during the winter. To answer this question, the company wants
to build a regression model to help predict the average monthly
heating cost (Y) as a function of the average outside temperature
in the winter (X1), the amount of attic insulation in the house
(X2), the age of the furnace in the house (X3), and the size of the
house measured in square feet (X4). Data on 20 homes has been
recorded and is shown in Worksheet A2. The company wants to build a
regression model to estimate the average monthly heating cost based
on outside temperature, attic insulation, age of the furnace, and
size of the house. (*Note*: I have made some modifications
to this problem since creating the supporting video - although it
shows a different problem number - it is the solution for this
problem).

a) Prepare a scatter plot showing the relationship between the heating cost and each of the independent variables.

b) If the home inspector wanted to build a regression model using only one independent variable to predict heating cost, which variable should be used?

c) Why?

d) How do you use the value of Significance F in the model with only one independent variable?

e) If the home inspector wanted to build a regression model using two independent variable to

predict heating cost, which variable should be added to the model?

f) Why?

g) If the home inspector wanted to build a regression model using three independent variable to predict heating cost, which variable should be added to the two variable model?

h) Why?

i) If the home inspector wanted to build a regression model using four independent variable to

predict heating cost, which variable should be added to the three variable model?

j) Why?

k) How do you use the value of Significance F in the model with more than one independent variable?

l) Does there appear to be any multicollinearity among the independent variables?

m) How can you tell if you have multicollinearity?

n) Which sets of variables indicate multicollinearity

o) Based on your best model, what is the expected average monthly heating cost for a home which

has an average outside temperature of 45, 8 inches of attic insulation, a 7 year old furnace, and is 2000 square foot?

Answer #1

a)

Scatter plots

i) average outside temp

ii) attic insulation :

iii)

age of furnace:

iv) square footage :

b)

Square footage should be suggested as the independent variable for regression with heating cost as the dependent variable.

c)

If only one independent variable is to be used for regression , then based on the scatterplots, square footage should be suggested as the independent variable as it shows the strongest linear relationship (linear trend) with the dependent variable heating cost.

d)

In case of only one variable, the vlaue of significance is seen only for that particular variable. If it is less than 0.05 (95% cutoff) then the variable explains a linear relationship with heating cost in a significant way.

If not, then the intercept must have high contribution, meaning F-value for the independent variable > cutoff value implies the variable is not showing a strong linear relation with heating cost.

e)

If an additional variable is to be used for regression avg outside temp should be used.

f)

This is because it shows the second most linear relationship with heating cost after square footage.

