Problem 1: Oil Production Data: The Data in the following are the annual world crude oil production in millions of barrels for the period 1880-1988. The data are taken from Moore and McCabe( 1993, p. 147).
Data:（SAS）
year barrels
1880 30
1890 77
1900 149
1905 215
1910 328
1915 432
1920 689
1925 1069
1930 1412
1935 1655
1940 2150
1945 2595
1950 3803
1955 5626
1960 7674
1962 8882
1964 10310
1966 12016
1968 14104
1970 16690
1972 18584
1974 20389
1976 20188
1978 21922
1980 21722
1982 19411
1984 19837
1986 20246
1988 21388
Answer:
The given data is as below:
year | barrels |
1880 | 30 |
1890 | 77 |
1900 | 149 |
1905 | 215 |
1910 | 328 |
1915 | 432 |
1920 | 689 |
1925 | 1069 |
1930 | 1412 |
1935 | 1655 |
1940 | 2150 |
1945 | 2595 |
1950 | 3803 |
1955 | 5626 |
1960 | 7674 |
1962 | 8882 |
1964 | 10310 |
1966 | 12016 |
1968 | 14104 |
1970 | 16690 |
1972 | 18584 |
1974 | 20389 |
1976 | 20188 |
1978 | 21922 |
1980 | 21722 |
1982 | 19411 |
1984 | 19837 |
1986 | 20246 |
1988 | 21388 |
a) Scatter plot of the oil production variable (OIL) versus Year:
Procedure for creating the Scatter Plot using MS-Excel:
b) Scatter plot of log(OIL) versus Year:
After logarithmic transformation of the OIL variable (barrels) the data becomes:
year | log(barrels) |
1880 | 3.401197 |
1890 | 4.343805 |
1900 | 5.003946 |
1905 | 5.370638 |
1910 | 5.793014 |
1915 | 6.068426 |
1920 | 6.535241 |
1925 | 6.974479 |
1930 | 7.252762 |
1935 | 7.411556 |
1940 | 7.673223 |
1945 | 7.861342 |
1950 | 8.243546 |
1955 | 8.635154 |
1960 | 8.945593 |
1962 | 9.091782 |
1964 | 9.24087 |
1966 | 9.393994 |
1968 | 9.554214 |
1970 | 9.722565 |
1972 | 9.830056 |
1974 | 9.922751 |
1976 | 9.912844 |
1978 | 9.995246 |
1980 | 9.986081 |
1982 | 9.873595 |
1984 | 9.895304 |
1986 | 9.915713 |
1988 | 9.970585 |
Procedure for creating the Scatter Plot using MS-Excel:
c) Fitting of linear regression line of log (OIL) on Year
Regression Analysis using MS-Excel:
Enter the transformed data into excel worksheet as follows:
year | log(barrels) |
1880 | 3.401197 |
1890 | 4.343805 |
1900 | 5.003946 |
1905 | 5.370638 |
1910 | 5.793014 |
1915 | 6.068426 |
1920 | 6.535241 |
1925 | 6.974479 |
1930 | 7.252762 |
1935 | 7.411556 |
1940 | 7.673223 |
1945 | 7.861342 |
1950 | 8.243546 |
1955 | 8.635154 |
1960 | 8.945593 |
1962 | 9.091782 |
1964 | 9.24087 |
1966 | 9.393994 |
1968 | 9.554214 |
1970 | 9.722565 |
1972 | 9.830056 |
1974 | 9.922751 |
1976 | 9.912844 |
1978 | 9.995246 |
1980 | 9.986081 |
1982 | 9.873595 |
1984 | 9.895304 |
1986 | 9.915713 |
1988 | 9.970585 |
Procedure:
1.Enter the data into Excel sheet. |
2.If this is the first time you have used an Excel add-in, click the File tab, otherwise skip to step 7. |
3.Click Options from the list on the left. |
4.Select Add-ins in the Excel Options box. |
5.In the Add-in list box, select Analysis Toolbox-VBA from the Inactive Application Add-ins list. |
6.Click OK. |
7.Then select Data/ Data Analysis tab from the menu bar. |
8.The Data Analysis dialog box will appear on the screen. |
9.From the Data Analysis dialog box, select Regression and click OK. |
10.The Regression dialog box will appear on the screen. |
11.Place independent variables (Year) in Input X Range and place dependent variable (Log(barrels)) in Input Y Range. Select the Labels check-box. |
12.Select the Residuals and Standardized Residuals. |
13.Click OK. |
Regression Analysis output:
Regression Statistics | |
Multiple R | 0.991703 |
R Square | 0.983474 |
Adjusted R Square | 0.982862 |
Standard Error | 0.256429 |
Observations | 29 |
ANOVA | |||||
df | SS | MS | F | Significance F | |
Regression | 1 | 105.6587 | 105.6587 | 1606.838 | 1.35E-25 |
Residual | 27 | 1.775403 | 0.065756 | ||
Total | 28 | 107.4341 |
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | -111.882 | 2.994322 | -37.3646 | 8.73E-25 | -118.026 | -105.738 |
year | 0.061589 | 0.001536 | 40.08539 | 1.35E-25 | 0.058436 | 0.064741 |
From the above Excel output, the regression line of log(oil) on year is as below:
Assessing the goodness of fit for this model:
Recall: R-squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination.
R-squared is always between 0 and 100%:
In general, the higher the R-squared, the better the model fits your data.
From the above Excel output, the coefficient of determination: R^{2} = 0.9835 = 98%.
It indicates that he overall fitting of this linear regression model is good.
d) Using Residual analysis to test the assumptions of the regression model:
From the output of Excel:
Observation | Predicted log(barrels) | Residuals | Standard Residuals |
1 | 3.905435 | -0.50424 | -2.00247 |
2 | 4.521324 | -0.17752 | -0.70498 |
3 | 5.137213 | -0.13327 | -0.52924 |
4 | 5.445158 | -0.07452 | -0.29594 |
5 | 5.753102 | 0.039912 | 0.1585 |
6 | 6.061047 | 0.007379 | 0.029304 |
7 | 6.368991 | 0.16625 | 0.660226 |
8 | 6.676935 | 0.297543 | 1.181628 |
9 | 6.98488 | 0.267882 | 1.063836 |
10 | 7.292824 | 0.118732 | 0.471518 |
11 | 7.600769 | 0.072454 | 0.287736 |
12 | 7.908713 | -0.04737 | -0.18813 |
13 | 8.216658 | 0.026888 | 0.106778 |
14 | 8.524602 | 0.110552 | 0.439032 |
15 | 8.832547 | 0.113046 | 0.448939 |
16 | 8.955725 | 0.136057 | 0.540322 |
17 | 9.078902 | 0.161967 | 0.643217 |
18 | 9.20208 | 0.191914 | 0.762145 |
19 | 9.325258 | 0.228956 | 0.909248 |
20 | 9.448436 | 0.274129 | 1.088644 |
21 | 9.571614 | 0.258443 | 1.026349 |
22 | 9.694791 | 0.227959 | 0.905291 |
23 | 9.817969 | 0.094875 | 0.376773 |
24 | 9.941147 | 0.054099 | 0.214842 |
25 | 10.06432 | -0.07824 | -0.31073 |
26 | 10.1875 | -0.31391 | -1.24661 |
27 | 10.31068 | -0.41538 | -1.64958 |
28 | 10.43386 | -0.51815 | -2.0577 |
29 | 10.55704 | -0.58645 | -2.32896 |
i) Linearity of Regression model:
Linearity of the regression model can be obtained by plotting the residuals on the vertical axis against the corresponding values of independent variable (Year) on the horizontal axis.
The above residual plot shows non-linearity. Therefore, it indicates the violation of the Linearity assumption.
ii) Assumption of Homoscedasticity (Constant Error Variance):
The assumption of Homoscedasticity can be understood by examining the graph between residuals and the fitted values.
In this plot the residuals are scattered randomly around zero, hence, the errors have constant variance or do not violate the assumption of homoscedasticity.
iii) Assumption of Independence of Error:
This assumption is particularly important when the data is collected over a period of time. Residual versus time graph can be plotted to ascertain the assumption of independence of error.
The above plot indicates the independence of error, i.e., this assumption is not violated.
iv) Assumption of Normality of Error:
This assumption can be tested by examining the normal probability plot of residuals.
The normal probability plot of the residuals should be roughly follow a straight line for meeting the assumption of normality. A straight line connecting the residuals indicates that the residuals are normally distributed.
Hence, this graph shows clearly that one of the standard assumptions is violated and the violated assumption is the Assumption of Linearity of Regression model.
Get Answers For Free
Most questions answered within 1 hours.