Question

Problem 1: Oil Production Data: The Data in the following are the annual world crude oil          ...

Problem 1: Oil Production Data: The Data in the following are the annual world crude oil           production in millions of barrels for the period 1880-1988. The data are taken from Moore and McCabe( 1993, p. 147).

  1. Construct a scatter plot of the oil production variable (OIL) versus Year and observe that the scatter of points on the graph is not linear. In order to fit a linear model to these data, OIL must be transformed.
  2. Construct a scatter plot of log(OIL) versus Year. The scatter of points now follows a straight line from 1880 to 1973. Political turmoil in the oil-producing regions of the Middle East affected patterns of oil production after 1973.
  3. Fit a linear regression line of log (OIL) on Year. Assess the goodness of fit of the model.
  4. Construct the index plot of the standardized residuals. This graph shows clearly that one of the standard assumptions is violated. Which one?

Data:(SAS)

year barrels

1880 30

1890 77

1900 149

1905 215

1910 328

1915 432

1920 689

1925 1069

1930 1412

1935 1655

1940 2150

1945 2595

1950 3803

1955 5626

1960 7674

1962 8882

1964 10310

1966 12016

1968 14104

1970 16690

1972 18584

1974 20389

1976 20188

1978 21922

1980 21722

1982 19411

1984 19837

1986 20246

1988 21388

Homework Answers

Answer #1

Answer:

The given data is as below:

year barrels
1880 30
1890 77
1900 149
1905 215
1910 328
1915 432
1920 689
1925 1069
1930 1412
1935 1655
1940 2150
1945 2595
1950 3803
1955 5626
1960 7674
1962 8882
1964 10310
1966 12016
1968 14104
1970 16690
1972 18584
1974 20389
1976 20188
1978 21922
1980 21722
1982 19411
1984 19837
1986 20246
1988 21388

a) Scatter plot of the oil production variable (OIL) versus Year:

Procedure for creating the Scatter Plot using MS-Excel:

  1. Enter the data into Excel worksheet as shown above.
  2. In your spreadsheet, select the data to use for your scatter plot. Here you need two variables (year, barrels)
  3. Click Insert > charts > Insert Scatter (year, barrels) or Bubble Chart, and then pick the Scatter plot.
  4. Click the graph and then click the icons next to the chart to add finishing touches

b) Scatter plot of log(OIL) versus Year:

After logarithmic transformation of the OIL variable (barrels)  the data becomes:

year log(barrels)
1880 3.401197
1890 4.343805
1900 5.003946
1905 5.370638
1910 5.793014
1915 6.068426
1920 6.535241
1925 6.974479
1930 7.252762
1935 7.411556
1940 7.673223
1945 7.861342
1950 8.243546
1955 8.635154
1960 8.945593
1962 9.091782
1964 9.24087
1966 9.393994
1968 9.554214
1970 9.722565
1972 9.830056
1974 9.922751
1976 9.912844
1978 9.995246
1980 9.986081
1982 9.873595
1984 9.895304
1986 9.915713
1988 9.970585

Procedure for creating the Scatter Plot using MS-Excel:

  1. Enter the data into Excel worksheet as shown above.
  2. In your spreadsheet, select the data to use for your scatter plot. Here you need two variables (year, log(barrels)).
  3. Click Insert > charts > Insert Scatter (year, log(barrels)) or Bubble Chart, and then pick the Scatter plot.
  4. Click the graph and then click the icons next to the chart to add finishing touches

c) Fitting of  linear regression line of log (OIL) on Year

Regression Analysis using MS-Excel:

Enter the transformed data into excel worksheet as follows:

year log(barrels)
1880 3.401197
1890 4.343805
1900 5.003946
1905 5.370638
1910 5.793014
1915 6.068426
1920 6.535241
1925 6.974479
1930 7.252762
1935 7.411556
1940 7.673223
1945 7.861342
1950 8.243546
1955 8.635154
1960 8.945593
1962 9.091782
1964 9.24087
1966 9.393994
1968 9.554214
1970 9.722565
1972 9.830056
1974 9.922751
1976 9.912844
1978 9.995246
1980 9.986081
1982 9.873595
1984 9.895304
1986 9.915713
1988 9.970585

Procedure:

1.Enter the data into Excel sheet.
2.If this is the first time you have used an Excel add-in, click the File tab, otherwise skip to step 7.
3.Click Options from the list on the left.
4.Select Add-ins in the Excel Options box.
5.In the Add-in list box, select Analysis Toolbox-VBA from the Inactive Application Add-ins list.
6.Click OK.
7.Then select Data/ Data Analysis tab from the menu bar.
8.The Data Analysis dialog box will appear on the screen.
9.From the Data Analysis dialog box, select Regression and click OK.
10.The Regression dialog box will appear on the screen.
11.Place independent variables (Year) in Input X Range and place dependent variable (Log(barrels)) in Input Y Range. Select the Labels check-box.
12.Select the Residuals and Standardized Residuals.
13.Click OK.

Regression Analysis output:

Regression Statistics
Multiple R 0.991703
R Square 0.983474
Adjusted R Square 0.982862
Standard Error 0.256429
Observations 29
ANOVA
df SS MS F Significance F
Regression 1 105.6587 105.6587 1606.838 1.35E-25
Residual 27 1.775403 0.065756
Total 28 107.4341
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept -111.882 2.994322 -37.3646 8.73E-25 -118.026 -105.738
year 0.061589 0.001536 40.08539 1.35E-25 0.058436 0.064741

From the above Excel output, the regression line of log(oil) on year is as below:

Assessing the goodness of fit for this model:

Recall: R-squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination.

R-squared is always between 0 and 100%:

  • 0% indicates that the model explains none of the variability of the response data around its mean.
  • 100% indicates that the model explains all the variability of the response data around its mean.

In general, the higher the R-squared, the better the model fits your data.

From the above Excel output, the coefficient of determination: R2 = 0.9835 = 98%.

It indicates that he overall fitting of this linear regression model is good.

d) Using Residual analysis to test the assumptions of the regression model:

From the output of Excel:

Observation Predicted log(barrels) Residuals Standard Residuals
1 3.905435 -0.50424 -2.00247
2 4.521324 -0.17752 -0.70498
3 5.137213 -0.13327 -0.52924
4 5.445158 -0.07452 -0.29594
5 5.753102 0.039912 0.1585
6 6.061047 0.007379 0.029304
7 6.368991 0.16625 0.660226
8 6.676935 0.297543 1.181628
9 6.98488 0.267882 1.063836
10 7.292824 0.118732 0.471518
11 7.600769 0.072454 0.287736
12 7.908713 -0.04737 -0.18813
13 8.216658 0.026888 0.106778
14 8.524602 0.110552 0.439032
15 8.832547 0.113046 0.448939
16 8.955725 0.136057 0.540322
17 9.078902 0.161967 0.643217
18 9.20208 0.191914 0.762145
19 9.325258 0.228956 0.909248
20 9.448436 0.274129 1.088644
21 9.571614 0.258443 1.026349
22 9.694791 0.227959 0.905291
23 9.817969 0.094875 0.376773
24 9.941147 0.054099 0.214842
25 10.06432 -0.07824 -0.31073
26 10.1875 -0.31391 -1.24661
27 10.31068 -0.41538 -1.64958
28 10.43386 -0.51815 -2.0577
29 10.55704 -0.58645 -2.32896

i) Linearity of Regression model:

Linearity of the regression model can be obtained by plotting the residuals on the vertical axis against the corresponding values of independent variable (Year) on the horizontal axis.

The above residual plot shows non-linearity. Therefore, it indicates the violation of the Linearity assumption.

ii) Assumption of Homoscedasticity (Constant Error Variance):

The assumption of Homoscedasticity can be understood by examining the graph between residuals and the fitted values.

In this plot the residuals are scattered randomly around zero, hence, the errors have constant variance or do not violate the assumption of homoscedasticity.

iii) Assumption of Independence of Error:

This assumption is particularly important when the data is collected over a period of time. Residual versus time graph can be plotted to ascertain the assumption of independence of error.

The above plot indicates the independence of error, i.e., this assumption is not violated.

iv) Assumption of Normality of Error:

This assumption can be tested by examining the normal probability plot of residuals.

The normal probability plot of the residuals should be roughly follow a straight line for meeting the assumption of normality. A straight line connecting the residuals indicates that the residuals are normally distributed.

Hence, this graph shows clearly that one of the standard assumptions is violated and the violated assumption is the Assumption of Linearity of Regression model.

Know the answer?
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for?
Ask your own homework help question
Similar Questions
ADVERTISEMENT
Need Online Homework Help?

Get Answers For Free
Most questions answered within 1 hours.

Ask a Question
ADVERTISEMENT