Question

# Problem 1: Oil Production Data: The Data in the following are the annual world crude oil          ...

Problem 1: Oil Production Data: The Data in the following are the annual world crude oil           production in millions of barrels for the period 1880-1988. The data are taken from Moore and McCabe( 1993, p. 147).

1. Construct a scatter plot of the oil production variable (OIL) versus Year and observe that the scatter of points on the graph is not linear. In order to fit a linear model to these data, OIL must be transformed.
2. Construct a scatter plot of log(OIL) versus Year. The scatter of points now follows a straight line from 1880 to 1973. Political turmoil in the oil-producing regions of the Middle East affected patterns of oil production after 1973.
3. Fit a linear regression line of log (OIL) on Year. Assess the goodness of fit of the model.
4. Construct the index plot of the standardized residuals. This graph shows clearly that one of the standard assumptions is violated. Which one?

Data:（SAS）

year barrels

1880 30

1890 77

1900 149

1905 215

1910 328

1915 432

1920 689

1925 1069

1930 1412

1935 1655

1940 2150

1945 2595

1950 3803

1955 5626

1960 7674

1962 8882

1964 10310

1966 12016

1968 14104

1970 16690

1972 18584

1974 20389

1976 20188

1978 21922

1980 21722

1982 19411

1984 19837

1986 20246

1988 21388

The given data is as below:

 year barrels 1880 30 1890 77 1900 149 1905 215 1910 328 1915 432 1920 689 1925 1069 1930 1412 1935 1655 1940 2150 1945 2595 1950 3803 1955 5626 1960 7674 1962 8882 1964 10310 1966 12016 1968 14104 1970 16690 1972 18584 1974 20389 1976 20188 1978 21922 1980 21722 1982 19411 1984 19837 1986 20246 1988 21388

a) Scatter plot of the oil production variable (OIL) versus Year:

Procedure for creating the Scatter Plot using MS-Excel:

1. Enter the data into Excel worksheet as shown above.
2. In your spreadsheet, select the data to use for your scatter plot. Here you need two variables (year, barrels)
3. Click Insert > charts > Insert Scatter (year, barrels) or Bubble Chart, and then pick the Scatter plot.
4. Click the graph and then click the icons next to the chart to add finishing touches

b) Scatter plot of log(OIL) versus Year:

After logarithmic transformation of the OIL variable (barrels)  the data becomes:

 year log(barrels) 1880 3.401197 1890 4.343805 1900 5.003946 1905 5.370638 1910 5.793014 1915 6.068426 1920 6.535241 1925 6.974479 1930 7.252762 1935 7.411556 1940 7.673223 1945 7.861342 1950 8.243546 1955 8.635154 1960 8.945593 1962 9.091782 1964 9.24087 1966 9.393994 1968 9.554214 1970 9.722565 1972 9.830056 1974 9.922751 1976 9.912844 1978 9.995246 1980 9.986081 1982 9.873595 1984 9.895304 1986 9.915713 1988 9.970585

Procedure for creating the Scatter Plot using MS-Excel:

1. Enter the data into Excel worksheet as shown above.
2. In your spreadsheet, select the data to use for your scatter plot. Here you need two variables (year, log(barrels)).
3. Click Insert > charts > Insert Scatter (year, log(barrels)) or Bubble Chart, and then pick the Scatter plot.
4. Click the graph and then click the icons next to the chart to add finishing touches

c) Fitting of  linear regression line of log (OIL) on Year

Regression Analysis using MS-Excel:

Enter the transformed data into excel worksheet as follows:

 year log(barrels) 1880 3.401197 1890 4.343805 1900 5.003946 1905 5.370638 1910 5.793014 1915 6.068426 1920 6.535241 1925 6.974479 1930 7.252762 1935 7.411556 1940 7.673223 1945 7.861342 1950 8.243546 1955 8.635154 1960 8.945593 1962 9.091782 1964 9.24087 1966 9.393994 1968 9.554214 1970 9.722565 1972 9.830056 1974 9.922751 1976 9.912844 1978 9.995246 1980 9.986081 1982 9.873595 1984 9.895304 1986 9.915713 1988 9.970585

Procedure:

 1.Enter the data into Excel sheet. 2.If this is the first time you have used an Excel add-in, click the File tab, otherwise skip to step 7. 3.Click Options from the list on the left. 4.Select Add-ins in the Excel Options box. 5.In the Add-in list box, select Analysis Toolbox-VBA from the Inactive Application Add-ins list. 6.Click OK. 7.Then select Data/ Data Analysis tab from the menu bar. 8.The Data Analysis dialog box will appear on the screen. 9.From the Data Analysis dialog box, select Regression and click OK. 10.The Regression dialog box will appear on the screen. 11.Place independent variables (Year) in Input X Range and place dependent variable (Log(barrels)) in Input Y Range. Select the Labels check-box. 12.Select the Residuals and Standardized Residuals. 13.Click OK.

Regression Analysis output:

 Regression Statistics Multiple R 0.991703 R Square 0.983474 Adjusted R Square 0.982862 Standard Error 0.256429 Observations 29
 ANOVA df SS MS F Significance F Regression 1 105.6587 105.6587 1606.838 1.35E-25 Residual 27 1.775403 0.065756 Total 28 107.4341
 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept -111.882 2.994322 -37.3646 8.73E-25 -118.026 -105.738 year 0.061589 0.001536 40.08539 1.35E-25 0.058436 0.064741

From the above Excel output, the regression line of log(oil) on year is as below:

Assessing the goodness of fit for this model:

Recall: R-squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination.

R-squared is always between 0 and 100%:

• 0% indicates that the model explains none of the variability of the response data around its mean.
• 100% indicates that the model explains all the variability of the response data around its mean.

In general, the higher the R-squared, the better the model fits your data.

From the above Excel output, the coefficient of determination: R2 = 0.9835 = 98%.

It indicates that he overall fitting of this linear regression model is good.

d) Using Residual analysis to test the assumptions of the regression model:

From the output of Excel:

 Observation Predicted log(barrels) Residuals Standard Residuals 1 3.905435 -0.50424 -2.00247 2 4.521324 -0.17752 -0.70498 3 5.137213 -0.13327 -0.52924 4 5.445158 -0.07452 -0.29594 5 5.753102 0.039912 0.1585 6 6.061047 0.007379 0.029304 7 6.368991 0.16625 0.660226 8 6.676935 0.297543 1.181628 9 6.98488 0.267882 1.063836 10 7.292824 0.118732 0.471518 11 7.600769 0.072454 0.287736 12 7.908713 -0.04737 -0.18813 13 8.216658 0.026888 0.106778 14 8.524602 0.110552 0.439032 15 8.832547 0.113046 0.448939 16 8.955725 0.136057 0.540322 17 9.078902 0.161967 0.643217 18 9.20208 0.191914 0.762145 19 9.325258 0.228956 0.909248 20 9.448436 0.274129 1.088644 21 9.571614 0.258443 1.026349 22 9.694791 0.227959 0.905291 23 9.817969 0.094875 0.376773 24 9.941147 0.054099 0.214842 25 10.06432 -0.07824 -0.31073 26 10.1875 -0.31391 -1.24661 27 10.31068 -0.41538 -1.64958 28 10.43386 -0.51815 -2.0577 29 10.55704 -0.58645 -2.32896

i) Linearity of Regression model:

Linearity of the regression model can be obtained by plotting the residuals on the vertical axis against the corresponding values of independent variable (Year) on the horizontal axis.

The above residual plot shows non-linearity. Therefore, it indicates the violation of the Linearity assumption.

ii) Assumption of Homoscedasticity (Constant Error Variance):

The assumption of Homoscedasticity can be understood by examining the graph between residuals and the fitted values.

In this plot the residuals are scattered randomly around zero, hence, the errors have constant variance or do not violate the assumption of homoscedasticity.

iii) Assumption of Independence of Error:

This assumption is particularly important when the data is collected over a period of time. Residual versus time graph can be plotted to ascertain the assumption of independence of error.

The above plot indicates the independence of error, i.e., this assumption is not violated.

iv) Assumption of Normality of Error:

This assumption can be tested by examining the normal probability plot of residuals.

The normal probability plot of the residuals should be roughly follow a straight line for meeting the assumption of normality. A straight line connecting the residuals indicates that the residuals are normally distributed.

Hence, this graph shows clearly that one of the standard assumptions is violated and the violated assumption is the Assumption of Linearity of Regression model.

#### Earn Coins

Coins can be redeemed for fabulous gifts.