Question

Suppose that you have the following data below for x (the independent variable) and y (the...

Suppose that you have the following data below for x (the independent variable) and y (the response variable): USING R STUDIO

independent.var = c(5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60)

response.var = c(16.3, 9.7, 8.1, 4.2, 3.4, 2.9, 2.4, 2.3, 1.9, 1.7, 1.4, 1.3)

A) Using a linear model, fit a line to the above data without using a re-expression.   Show the fitted line relative to a scatterplot of the data. Comment on what you see in terms of fit, and also calculate R2.

B) Re-express the data so that you obtain a better linear fit, and explain how and why you chose your re-expression. Also, show the re-expressed fitted line relative to the re-expressed data. Comment on what you see in terms of fit, and also calculate R2.

The data can be entered as below.

```> x <- c(5,10,15,20,25,30,35,40,45,50,55,60)
> y <- c(16.3,9.7,8.1,4.2,3.4,2.9,2.4,2.3,1.9,1.7,1.4,1.3)```

(A) We have the regression commands as below.

```> summary(lm(y ~ x))

Call:
lm(formula = y ~ x)

Residuals:
Min      1Q  Median      3Q     Max
-3.0540 -1.8463 -0.1575  0.9226  5.9013

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.44697    1.62198   7.057 3.47e-05 ***
x           -0.20965    0.04408  -4.756 0.000773 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.635 on 10 degrees of freedom
Multiple R-squared:  0.6935,    Adjusted R-squared:  0.6628
F-statistic: 22.62 on 1 and 10 DF,  p-value: 0.0007726```

The r-squared would be as below.

```> summary(lm(y ~ x))\$r.squared
[1] 0.6934783```

The plot is as below. The abline() function have the parameters of intercept and slope coefficient found in the regression command above.

```> plot(x,y)
> abline(11.44697,-0.20965)```

As can be seen, the fit is not appropriate, as at first, the points are above the regression line, then consecutively below the line and then again above the line.

________________________________________________________________________

________________________________________________________________________

(B) The new regression results are as below after the re-expression.

```> summary(lm(log(y) ~ log(x)))

Call:
lm(formula = log(y) ~ log(x))

Residuals:
Min       1Q   Median       3Q      Max
-0.16770 -0.05955 -0.01141  0.01772  0.29541

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  4.66177    0.15825   29.46 4.74e-11 ***
log(x)      -1.05808    0.04718  -22.43 6.99e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1183 on 10 degrees of freedom
Multiple R-squared:  0.9805,    Adjusted R-squared:  0.9786
F-statistic: 502.9 on 1 and 10 DF,  p-value: 6.988e-10```

The r-squared is as below.

```> summary(lm(log(y) ~ log(x)))\$r.squared
[1] 0.9805042```

The plot is as below.

```> plot(log(x),log(y))
> abline(4.66177,-1.05808)```

As can be seen, in this new specification, the data is quite appropriately fitted than before, as now, the data is randomly above or below the regression line when seen consecutively.

The reason this specification is chosen is that, it can be seen in part-a plot that the data seems to be in parabolic shape, of . This can be re-expressed as or or , which is used.