Question

Suppose that you have the following data below for x (the independent variable) and y (the...

Suppose that you have the following data below for x (the independent variable) and y (the response variable): USING R STUDIO

independent.var = c(5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60)

response.var = c(16.3, 9.7, 8.1, 4.2, 3.4, 2.9, 2.4, 2.3, 1.9, 1.7, 1.4, 1.3)

A) Using a linear model, fit a line to the above data without using a re-expression.   Show the fitted line relative to a scatterplot of the data. Comment on what you see in terms of fit, and also calculate R2.

B) Re-express the data so that you obtain a better linear fit, and explain how and why you chose your re-expression. Also, show the re-expressed fitted line relative to the re-expressed data. Comment on what you see in terms of fit, and also calculate R2.

Homework Answers

Answer #1

The data can be entered as below.

> x <- c(5,10,15,20,25,30,35,40,45,50,55,60)
> y <- c(16.3,9.7,8.1,4.2,3.4,2.9,2.4,2.3,1.9,1.7,1.4,1.3)

(A) We have the regression commands as below.

> summary(lm(y ~ x))

Call:
lm(formula = y ~ x)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.0540 -1.8463 -0.1575  0.9226  5.9013 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 11.44697    1.62198   7.057 3.47e-05 ***
x           -0.20965    0.04408  -4.756 0.000773 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.635 on 10 degrees of freedom
Multiple R-squared:  0.6935,    Adjusted R-squared:  0.6628 
F-statistic: 22.62 on 1 and 10 DF,  p-value: 0.0007726

The r-squared would be as below.

> summary(lm(y ~ x))$r.squared
[1] 0.6934783

The plot is as below. The abline() function have the parameters of intercept and slope coefficient found in the regression command above.

> plot(x,y)
> abline(11.44697,-0.20965)

As can be seen, the fit is not appropriate, as at first, the points are above the regression line, then consecutively below the line and then again above the line.

________________________________________________________________________

________________________________________________________________________

(B) The new regression results are as below after the re-expression.

> summary(lm(log(y) ~ log(x)))

Call:
lm(formula = log(y) ~ log(x))

Residuals:
     Min       1Q   Median       3Q      Max 
-0.16770 -0.05955 -0.01141  0.01772  0.29541 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.66177    0.15825   29.46 4.74e-11 ***
log(x)      -1.05808    0.04718  -22.43 6.99e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1183 on 10 degrees of freedom
Multiple R-squared:  0.9805,    Adjusted R-squared:  0.9786 
F-statistic: 502.9 on 1 and 10 DF,  p-value: 6.988e-10

The r-squared is as below.

> summary(lm(log(y) ~ log(x)))$r.squared
[1] 0.9805042

The plot is as below.

> plot(log(x),log(y))
> abline(4.66177,-1.05808)

As can be seen, in this new specification, the data is quite appropriately fitted than before, as now, the data is randomly above or below the regression line when seen consecutively.

The reason this specification is chosen is that, it can be seen in part-a plot that the data seems to be in parabolic shape, of . This can be re-expressed as or or , which is used.

Know the answer?
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for?
Ask your own homework help question
Similar Questions
Refer to the data below. Suppose X is the independent variable and Y is the dependent...
Refer to the data below. Suppose X is the independent variable and Y is the dependent variable. Calculate the variance of X, variance of Y, standard deviation of X, standard deviation of Y, the covariance between X and Y, the correlation coefficient between X and Y, the slope of the regression line, the Y intercept of the regression, ESS, RSS, TSS, and R-square of the regression line. Predict the value of Y when X=25. Show your work by constructing the...
The data shown below for the dependent variable, y, and the independent variable, x, have been...
The data shown below for the dependent variable, y, and the independent variable, x, have been collected using simple random sampling. X 11 14 10 15 14 10 13 12 y 110 130 100 140 150 120 110 130 Which graph below shows a scatter plot for this data? What is the sign of the correlation coefficient?
Data were collected in the 1920s to determine the relationship between speed (km/h) and stopping distance...
Data were collected in the 1920s to determine the relationship between speed (km/h) and stopping distance (m) of cars. A linear regression was fitted, giving a line of best fit for dist.m in terms of speed.kph as dist.m = -5.3581 + 0.7448 speed.kph with an R2 value of 0.6511. (a) Before using the results of this linear regression, what plot should you look at to assess whether the regression model is appropriate for explaining the relationship between stopping distance and...
The data table found below contains the listed prices and weights of the diamonds in 20...
The data table found below contains the listed prices and weights of the diamonds in 20 rings offered for sale in a newspaper in a foreign city. The prices are in local dollars with the weights in carats. Complete parts​ (a) through​ (g) below. Weight (carats) Price (dollars) 0.13 224 0.22 578 0.24 649 0.12 229 0.3 895 0.38 1155 0.24 624 0.28 781 0.14 270 0.25 628 0.21 514 0.14 275 0.25 640 0.39 1239 0.3 831 0.12 191...
You are given the following data set: {(0,0), (0.5,0.6), (1,0.9), (1.1, 1), (1.5, 1.7)}, where the...
You are given the following data set: {(0,0), (0.5,0.6), (1,0.9), (1.1, 1), (1.5, 1.7)}, where the first coordinate is the independent (explanatory) variable, and the second coordinate is the dependent variable. (a) Find a best fit model if the model is restricted to just be a constant (i.e. the best fit line has slope 0). (b) What is the mean squared error of (a)? (c) What is the mean squared error of the model that has y-intercept 0 and slope...
1. Given the following observations of quantitative variables X and Y: x= 0, 1, 2, 3,...
1. Given the following observations of quantitative variables X and Y: x= 0, 1, 2, 3, 15 y= 3, 4, 6, 10, 0 a. Make a scatterplot of the data on the axes. Circle the most influential observation. (4 points)    (b)   Determine the LSRL of Y on X. Draw this line carefully on your scatterplot. (4 points) (c)   What is the definition of a regression outlier? (4 points) (d) Which data point is the biggest regression outlier? (4 points)...
Using the data given below, calculate the linear correlation between the two variables x and y....
Using the data given below, calculate the linear correlation between the two variables x and y. X 0 3 3 1 4 y 1 7 2 5 5 (a)        .794                 (b) .878            (c) .497            (d) .543 Refer to question 4. Assume you are using a 0.05 level of significance; is there a significant relationship between the two variables x and y? Yes                        (b) no The heights (in inches) and pulse rates (in beats per minutes) for a sample of 40...
Data Set Preparation (Using A JMP Folder) Can email you if comment your email. 1. (10...
Data Set Preparation (Using A JMP Folder) Can email you if comment your email. 1. (10 pts.) Using the “Toyota Corolla” data set on Canvas (Home à “JMP” à “(Under: JMP Data Sets folder)”, you will be modeling the “Price” of a car as the dependent variable (Y). Please select one independent variable (X) you think may help explain Price, from the following three: “Age”, “Mileage”, or “Weight” of a car. In the space below, state your choice and explain...
Here is the data Stat7_prob3.txt : "FATALS","CUTTING" 270,15692 183,16198 319,17235 103,18463 149,18959 124,19103 62,19618 298,20436 330,21229...
Here is the data Stat7_prob3.txt : "FATALS","CUTTING" 270,15692 183,16198 319,17235 103,18463 149,18959 124,19103 62,19618 298,20436 330,21229 486,18660 302,17551 373,17466 187,17388 347,15261 168,14731 234,14237 68,13216 162,12017 27,11845 40,11905 26,11881 41,11974 116,11892 84,11810 43,12076 292,12342 89,12608 148,13049 166,11656 32,13305 72,13390 27,13625 154,13865 44,14445 3,14424 3,14315 153,13761 11,12471 9,10960 17,9218 2,9054 5,9218 63,8817 41,7744 10,6907 3,6440 26,6021 52,5561 31,5309 3,5320 19,4784 10,4311 12,3663 88,3060 0,2779 41,2623 2,2058 5,1890 2,1535 0,1515 0,1595 23,1803 4,1495 0,1432 Here is the question : Please Use R software/studio...
Use the dependent variable (labeled Y) and the independent variables (labeled X1, X2, and X3) in...
Use the dependent variable (labeled Y) and the independent variables (labeled X1, X2, and X3) in the data file. Use Excel to perform the regression and correlation analysis to answer the following. Generate a scatterplot for the specified dependent variable (Y) and the X1 independent variable, including the graph of the "best fit" line. Interpret. Determine the equation of the "best fit" line, which describes the relationship between the dependent variable and the selected independent variable. Determine the coefficient of...