Question

Suppose that you have the following data below for x (the independent variable) and y (the response variable): USING R STUDIO

independent.var = c(5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60)

response.var = c(16.3, 9.7, 8.1, 4.2, 3.4, 2.9, 2.4, 2.3, 1.9, 1.7, 1.4, 1.3)

A) Using a linear model, fit a
line to the above data without using a re-expression.
Show the fitted line relative to a scatterplot of the data. Comment
on what you see in terms of fit, and also calculate
R^{2}.

B) Re-express the data so that you obtain a better
linear fit, and explain how and why you chose your re-expression.
Also, show the re-expressed fitted line relative to the
re-expressed data. Comment on what you see in terms of fit, and
also calculate R^{2}.

Answer #1

The data can be entered as below.

> x <- c(5,10,15,20,25,30,35,40,45,50,55,60) > y <- c(16.3,9.7,8.1,4.2,3.4,2.9,2.4,2.3,1.9,1.7,1.4,1.3)

**(A)** We have the regression commands as
below.

> summary(lm(y ~ x)) Call: lm(formula = y ~ x) Residuals: Min 1Q Median 3Q Max -3.0540 -1.8463 -0.1575 0.9226 5.9013 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 11.44697 1.62198 7.057 3.47e-05 *** x -0.20965 0.04408 -4.756 0.000773 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.635 on 10 degrees of freedom Multiple R-squared: 0.6935, Adjusted R-squared: 0.6628 F-statistic: 22.62 on 1 and 10 DF, p-value: 0.0007726

The r-squared would be as below.

> summary(lm(y ~ x))$r.squared [1] 0.6934783

The plot is as below. The abline() function have the parameters of intercept and slope coefficient found in the regression command above.

> plot(x,y) > abline(11.44697,-0.20965)

As can be seen, the fit is not appropriate, as at first, the points are above the regression line, then consecutively below the line and then again above the line.

**________________________________________________________________________**

**________________________________________________________________________**

**(B)** The new regression results are as below
after the re-expression.

> summary(lm(log(y) ~ log(x))) Call: lm(formula = log(y) ~ log(x)) Residuals: Min 1Q Median 3Q Max -0.16770 -0.05955 -0.01141 0.01772 0.29541 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 4.66177 0.15825 29.46 4.74e-11 *** log(x) -1.05808 0.04718 -22.43 6.99e-10 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1183 on 10 degrees of freedom Multiple R-squared: 0.9805, Adjusted R-squared: 0.9786 F-statistic: 502.9 on 1 and 10 DF, p-value: 6.988e-10

The r-squared is as below.

> summary(lm(log(y) ~ log(x)))$r.squared [1] 0.9805042

The plot is as below.

> plot(log(x),log(y)) > abline(4.66177,-1.05808)

As can be seen, in this new specification, the data is quite appropriately fitted than before, as now, the data is randomly above or below the regression line when seen consecutively.

The reason this specification is chosen is that, it can be seen in part-a plot that the data seems to be in parabolic shape, of . This can be re-expressed as or or , which is used.

Refer to the data below. Suppose X is the independent variable
and Y is the dependent variable. Calculate the variance of X,
variance of Y, standard deviation of X, standard deviation of Y,
the covariance between X and Y, the correlation coefficient between
X and Y, the slope of the regression line, the Y intercept of the
regression, ESS, RSS, TSS, and R-square of the regression line.
Predict the value of Y when X=25. Show your work by constructing
the...

The data shown below for the dependent variable, y, and the
independent variable, x, have been collected using simple random
sampling.
X
11
14
10
15
14
10
13
12
y
110
130
100
140
150
120
110
130
Which graph below shows a scatter plot for this data?
What is the sign of the correlation coefficient?

Data were collected in the 1920s to determine the relationship
between speed (km/h) and stopping distance (m) of cars. A linear
regression was fitted, giving a line of best fit for dist.m in
terms of speed.kph as
dist.m = -5.3581 + 0.7448 speed.kph
with an R2 value of 0.6511.
(a) Before using the results of this linear regression, what
plot should you look at to assess whether the regression model is
appropriate for explaining the relationship between stopping
distance and...

The data table found below contains the listed prices and
weights of the diamonds in 20 rings offered for sale in a newspaper
in a foreign city. The prices are in local dollars with the weights
in carats. Complete parts (a) through (g) below.
Weight (carats)
Price (dollars)
0.13
224
0.22
578
0.24
649
0.12
229
0.3
895
0.38
1155
0.24
624
0.28
781
0.14
270
0.25
628
0.21
514
0.14
275
0.25
640
0.39
1239
0.3
831
0.12
191...

You are given the following data set: {(0,0), (0.5,0.6),
(1,0.9), (1.1, 1), (1.5, 1.7)}, where the first coordinate is the
independent (explanatory) variable, and the second coordinate is
the dependent variable.
(a) Find a best fit model if the model is restricted to just be
a constant (i.e. the best fit line has slope 0). (b) What is the
mean squared error of (a)?
(c) What is the mean squared error of the model that has
y-intercept 0 and slope...

1. Given the following observations of quantitative variables
X and Y:
x= 0, 1, 2, 3, 15
y= 3, 4, 6, 10, 0
a. Make a scatterplot of the data on the axes. Circle the most
influential observation. (4 points)
(b) Determine the LSRL of Y on X. Draw
this line carefully on your scatterplot. (4 points)
(c) What is the definition of a regression outlier? (4
points)
(d) Which data point is the biggest regression outlier?
(4 points)...

Using the data given below, calculate the linear correlation
between the two variables x and y.
X
0
3
3
1
4
y
1
7
2
5
5
(a)
.794
(b)
.878
(c)
.497
(d) .543
Refer to question 4. Assume you are using a 0.05 level of
significance; is there a
significant
relationship between the two variables x and y?
Yes
(b) no
The heights (in inches) and pulse rates (in beats per minutes)
for a sample of 40...

Data Set Preparation
(Using A JMP Folder) Can email you if comment your email.
1. (10 pts.) Using the “Toyota Corolla” data set on Canvas (Home
à “JMP” à “(Under: JMP Data Sets folder)”, you will be modeling the
“Price” of a car as the dependent variable (Y). Please select one
independent variable (X) you think may help explain Price, from the
following three: “Age”, “Mileage”, or “Weight” of a car. In the
space below, state your choice and explain...

Here is the data Stat7_prob3.txt :
"FATALS","CUTTING"
270,15692
183,16198
319,17235
103,18463
149,18959
124,19103
62,19618
298,20436
330,21229
486,18660
302,17551
373,17466
187,17388
347,15261
168,14731
234,14237
68,13216
162,12017
27,11845
40,11905
26,11881
41,11974
116,11892
84,11810
43,12076
292,12342
89,12608
148,13049
166,11656
32,13305
72,13390
27,13625
154,13865
44,14445
3,14424
3,14315
153,13761
11,12471
9,10960
17,9218
2,9054
5,9218
63,8817
41,7744
10,6907
3,6440
26,6021
52,5561
31,5309
3,5320
19,4784
10,4311
12,3663
88,3060
0,2779
41,2623
2,2058
5,1890
2,1535
0,1515
0,1595
23,1803
4,1495
0,1432
Here is the question :
Please Use R software/studio...

Use the dependent variable (labeled Y) and the independent
variables (labeled X1, X2, and X3) in the data file. Use Excel to
perform the regression and correlation analysis to answer the
following.
Generate a scatterplot for the specified dependent variable (Y)
and the X1 independent variable, including the graph of the "best
fit" line. Interpret.
Determine the equation of the "best fit" line, which describes
the relationship between the dependent variable and the selected
independent variable.
Determine the coefficient of...

ADVERTISEMENT

Get Answers For Free

Most questions answered within 1 hours.

ADVERTISEMENT

asked 3 minutes ago

asked 37 minutes ago

asked 45 minutes ago

asked 1 hour ago

asked 1 hour ago

asked 1 hour ago

asked 1 hour ago

asked 1 hour ago

asked 1 hour ago

asked 1 hour ago

asked 2 hours ago

asked 2 hours ago