Suppose that you have the following data below for x (the independent variable) and y (the response variable): USING R STUDIO
independent.var = c(5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60)
response.var = c(16.3, 9.7, 8.1, 4.2, 3.4, 2.9, 2.4, 2.3, 1.9, 1.7, 1.4, 1.3)
A) Using a linear model, fit a line to the above data without using a re-expression. Show the fitted line relative to a scatterplot of the data. Comment on what you see in terms of fit, and also calculate R2.
B) Re-express the data so that you obtain a better linear fit, and explain how and why you chose your re-expression. Also, show the re-expressed fitted line relative to the re-expressed data. Comment on what you see in terms of fit, and also calculate R2.
The data can be entered as below.
> x <- c(5,10,15,20,25,30,35,40,45,50,55,60) > y <- c(16.3,9.7,8.1,4.2,3.4,2.9,2.4,2.3,1.9,1.7,1.4,1.3)
(A) We have the regression commands as below.
> summary(lm(y ~ x)) Call: lm(formula = y ~ x) Residuals: Min 1Q Median 3Q Max -3.0540 -1.8463 -0.1575 0.9226 5.9013 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 11.44697 1.62198 7.057 3.47e-05 *** x -0.20965 0.04408 -4.756 0.000773 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.635 on 10 degrees of freedom Multiple R-squared: 0.6935, Adjusted R-squared: 0.6628 F-statistic: 22.62 on 1 and 10 DF, p-value: 0.0007726
The r-squared would be as below.
> summary(lm(y ~ x))$r.squared [1] 0.6934783
The plot is as below. The abline() function have the parameters of intercept and slope coefficient found in the regression command above.
> plot(x,y) > abline(11.44697,-0.20965)
As can be seen, the fit is not appropriate, as at first, the points are above the regression line, then consecutively below the line and then again above the line.
________________________________________________________________________
________________________________________________________________________
(B) The new regression results are as below after the re-expression.
> summary(lm(log(y) ~ log(x))) Call: lm(formula = log(y) ~ log(x)) Residuals: Min 1Q Median 3Q Max -0.16770 -0.05955 -0.01141 0.01772 0.29541 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 4.66177 0.15825 29.46 4.74e-11 *** log(x) -1.05808 0.04718 -22.43 6.99e-10 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1183 on 10 degrees of freedom Multiple R-squared: 0.9805, Adjusted R-squared: 0.9786 F-statistic: 502.9 on 1 and 10 DF, p-value: 6.988e-10
The r-squared is as below.
> summary(lm(log(y) ~ log(x)))$r.squared [1] 0.9805042
The plot is as below.
> plot(log(x),log(y)) > abline(4.66177,-1.05808)
As can be seen, in this new specification, the data is quite appropriately fitted than before, as now, the data is randomly above or below the regression line when seen consecutively.
The reason this specification is chosen is that, it can be seen in part-a plot that the data seems to be in parabolic shape, of . This can be re-expressed as or or , which is used.
Get Answers For Free
Most questions answered within 1 hours.