For the attached data set, create a multiple linear regression equation with profit as the dependent variable and R&D spend, Administration expenses and Marketing spend as the independent variables.
Please answer these 3 questions below.
1. Comment on the adequacy of the model.
2. Identify the predictors that are significant.
3. Interpret the coefficient of determination.
R&D Spend($) |
Administration ($) |
Marketing Spend ($) |
Profit ($) |
165349 |
136898 |
471784 |
192262 |
162598 |
151378 |
443899 |
191792 |
153442 |
101146 |
407935 |
191050 |
144372 |
118672 |
383200 |
182902 |
142107 |
91392 |
366168 |
166188 |
131877 |
99815 |
362861 |
156991 |
118672 |
147199 |
127717 |
156123 |
130298 |
145530 |
323877 |
155753 |
120543 |
148719 |
311613 |
152212 |
123335 |
108679 |
304982 |
149760 |
101913 |
110594 |
229161 |
146122 |
100672 |
91791 |
249745 |
144259 |
93864 |
127320 |
249839 |
141586 |
91992 |
135495 |
252665 |
134307 |
119943 |
156547 |
256513 |
132603 |
114524 |
122617 |
261776 |
129917 |
78013 |
121598 |
264346 |
126993 |
94657 |
145078 |
282574 |
125370 |
91749 |
114176 |
294920 |
124267 |
76254 |
113867 |
298664 |
118474 |
78389 |
153773 |
299737 |
111313 |
73995 |
122783 |
303319 |
110352 |
67533 |
105751 |
304769 |
108734 |
77044 |
99281 |
140575 |
108552 |
64665 |
139553 |
137963 |
107404 |
75329 |
144136 |
134050 |
105734 |
72108 |
127865 |
353184 |
105008 |
66052 |
182646 |
118148 |
103282 |
65605 |
153032 |
107138 |
101005 |
SUMMARY OUTPUT | ||||||||
Regression Statistics | ||||||||
Multiple R | 0.96507787 | 0.96507787 | ||||||
R Square | 0.93137529 | 0.93137529 | ||||||
Adjusted R Square | 0.92314032 | |||||||
Standard Error | 7895.11433 | |||||||
Observations | 29 | |||||||
ANOVA | ||||||||
df | SS | MS | F | Significance F | ||||
Regression | 3 | 21149545193 | 7049848398 | 113.1001 | 1.13593E-14 | |||
Residual | 25 | 1558320756 | 62332830.24 | |||||
Total | 28 | 22707865949 | ||||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Lower 95.0% | Upper 95.0% | |
Intercept | 57559.3856 | 11367.44409 | 5.063529246 | 3.16E-05 | 34147.69626 | 80971.07 | 34147.7 | 80971.07 |
R&D Spend($) | 0.81119127 | 0.06324241 | 12.82669766 | 1.7E-12 | 0.680941092 | 0.941441 | 0.680941 | 0.941441 |
Administration ($) | -0.083838 | 0.069633656 | -1.203986514 | 0.239871 | -0.227251182 | 0.059575 | -0.22725 | 0.059575 |
Marketing Spend ($) | 0.02222515 | 0.021498499 | 1.03380026 | 0.311127 | -0.022051834 | 0.066502 | -0.02205 | 0.066502 |
RESIDUAL OUTPUT | ||
Observation | Predicted Profit ($) | Residuals |
1 | 190697.453 | 1564.377223 |
2 | 186631.743 | 5160.317491 |
3 | 182616.369 | 8434.021238 |
4 | 173240.487 | 9661.50307 |
5 | 173311.668 | -7123.727744 |
6 | 164233.162 | -7242.041868 |
7 | 157255.942 | -1133.4318 |
8 | 158253.354 | -2500.754169 |
9 | 149799.783 | 2411.987375 |
10 | 155274.385 | -5514.425192 |
11 | 136051.537 | 10070.41283 |
12 | 137078.673 | 7180.72735 |
13 | 128579.277 | 13006.2432 |
14 | 126438.693 | 7868.656669 |
15 | 147432.715 | -14830.06457 |
16 | 145998.007 | -16080.96726 |
17 | 116523.578 | 10469.35166 |
18 | 128461.694 | -3091.323775 |
19 | 128967.869 | -4700.968596 |
20 | 116507.311 | 1966.719394 |
21 | 114917.893 | -3604.872951 |
22 | 114030.586 | -3678.336242 |
23 | 110248.764 | -1514.773636 |
24 | 114857.564 | -6305.523791 |
25 | 101381.219 | 6023.120827 |
26 | 109560.721 | -3827.181299 |
27 | 113182.1 | -8173.790205 |
28 | 98453.0289 | 4829.351087 |
29 | 100329.246 | 675.3936851 |
1. Comment on the adequacy of the model: From the output we can see that R Square / Adjusted R Square value is very high (0.93/0.92). Where as the range of R Square is in between 0 -1. Large R Square does not measure the appropriateness of the linear model. It does not imply that the regression model will predict accurately. Here residual values are very high as well.
2. Identify the predictors that are significant : Only R&D Spend($) variable is significant at 5% confidence level as P value is
1.7E-12 (<0.05).
3. Interpret the coefficient of determination : coefficient of determination is the ratio between explained variation and total variation. It tells us how much variability can be explained by the model.
PLEASE LET ME KNOW IF YOU HAVE ANY DOUBTS. THANKS!
Get Answers For Free
Most questions answered within 1 hours.