Question

2.) In this problem, we will perform multiple regression on the Boston housing data. The data...

2.) In this problem, we will perform multiple regression on the Boston housing data. The data contains 506 records with 14 variables. The variable medv is the response variable.

Solve the following problems in R and print out the commands and outputs :

To assess the data use

library(MASS)

data(Boston)

(a) First perform a multiple regression with all the variables, what can you say about the significance of the variables based on only the p-values. Next use the ”step” function to perform backward selection using (1) the AIC criteria and (2) the BIC criteria then compare the results. (By default the step function in R performs variable selection based on AIC criteria. Read the documentation to find out how to do the selection using BIC criteria. )

(b) Now make a histogram of the response variable (use hist()) to see if it is skewed. Using log(medv) as the response variable, perform the stepwise selection as previously using both AIC and BIC criteria. Compare with the previous results in terms of selected variables and adjusted R2.

Homework Answers

Know the answer?
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for?
Ask your own homework help question
Similar Questions
2. Consider the data set has four variables which are Y, X1, X2 and X3. Construct...
2. Consider the data set has four variables which are Y, X1, X2 and X3. Construct a multiple regression model using Y as response variable and other X variables as explanatory variables. (a) Write mathematics formulas (including the assumptions) and give R commands to obtain linear regression models for Y Xi, i =1, 2 and 3. (b) Write several lines of R commands to obtain correlations between Xi and Xj , i 6= j and i, j = 1, 2,...
Write the multiple regression equation for miles per gallon as the response variable. Use weight and...
Write the multiple regression equation for miles per gallon as the response variable. Use weight and horsepower as predictor variables. See Step 5 in the Python script. How might the car rental company use this model? OLS Regression Results ============================================================================== Dep. Variable: mpg R-squared: 0.858 Model: OLS Adj. R-squared: 0.848 Method: Least Squares F-statistic: 81.68 Date: Thu, 11 Jun 2020 Prob (F-statistic): 3.54e-12 Time: 10:50:26 Log-Likelihood: -67.238 No. Observations: 30 AIC: 140.5 Df Residuals: 27 BIC: 144.7 Df Model: 2...
1. Consider an actual proof problem 2. The problem should estimate one multiple linear regression, or...
1. Consider an actual proof problem 2. The problem should estimate one multiple linear regression, or a logistic regression. Explain the estimation. 3. Consider a different model from step 2. You can try adding other random variables to explain your model, or consider different ways to explain (For example: probit model, lasso, trees, random forest, gradient boosting, or neural net). Then, compare the estimation results of the original model and the new model. ps. If you answer the question with...
Data were found on eight​ pre-owned sedans of a certain make. Suppose a multiple regression on...
Data were found on eight​ pre-owned sedans of a certain make. Suppose a multiple regression on these data has 4 independent variables. The coefficient of determination is found to be 0.941 based on a sample of 18 paired observations. After calculating r Subscript adj Superscript 2 ​, determine the percentage of the variation in y that can be explained by the relationships between variables according to r Subscript adj Superscript 2 . Compare this result with the one obtained using...
Using R and the data in the table below, perform the regression of D on C...
Using R and the data in the table below, perform the regression of D on C (i.e., report the regression equation). Hint: The code to enter the vectors C and D into R is: C <- c(3, 6, 8, 9, 1, 3) D <- c(2, 7, 5, 4, 0, 4) C D 3 2 6 7 8 5 9 4 1 0 3 4 You must figure out how to obtain the regression equation from R. Enter the code below...
Data Set A- 7,7,7,9,9,9,10 Data Set B- 4,6,6,6,8,9,9,9,10,10,10 Step 1 Create a bar graph that examines...
Data Set A- 7,7,7,9,9,9,10 Data Set B- 4,6,6,6,8,9,9,9,10,10,10 Step 1 Create a bar graph that examines a variable or variables in your data set. Step 2 Find the sample mean, median (if it exists) mode for each set of data Next find the sample standard deviations for each set of data Create a box and whisker plot for each variable for each set data Eliminate any outliers from the samples. Redo part 1 for each variable (if necessary, if not...
1. General features of economic time series: trends, cycles, seasonality. 2. Simple linear regression model and...
1. General features of economic time series: trends, cycles, seasonality. 2. Simple linear regression model and multiple regression model: dependent variable, regressor, error term; fitted value, residuals; interpretation. 3. Population VS sample: a sample is a subset of a population. 4. Estimator VS estimate. 5. For what kind of models can we use OLS? 6. R-squared VS Adjusted R-squared. 7. Model selection criteria: R-squared/Adjusted R-squared; residual variance; AIC, BIC. 8. Hypothesis testing: p-value, confidence interval (CI), (null hypothesis , significance...
1) When we fit a model to data, which is typically larger? a) Test Error b)...
1) When we fit a model to data, which is typically larger? a) Test Error b) Training Error 2) What are reasons why test error could be LESS than training error? (Pick all that applies) a) By chance, the test set has easier cases than the training set. b) The model is highly complex, so training error systematically overestimates test error c) The model is not very complex, so training error systematically overestimates test error 3) Suppose we want to...
We have used the 1987 baseball salary data to illustrate linear regression. In this project, we...
We have used the 1987 baseball salary data to illustrate linear regression. In this project, we consider the 1992 baseball salary data set, which is available from http://www.amstat.org/publications/jse/datasets/baseball.dat.txt This data set (of dimension 337 × 18 ) contains salary information (and performance measures) of 337 Major League Baseball players in 1992. More detailed information can be found at http://www.amstat.org/publications/jse/datasets/baseball.txt The data set contains the following variables. Table 1: Variable Description for the 1992 Baseball Salary Data Var Columns Description salary...
1) Which is NOT a fundamental assumption of OLS (Ordinary Least Squares)? a)       The...
1) Which is NOT a fundamental assumption of OLS (Ordinary Least Squares)? a)       The regression model is nonlinear in the coefficients and error term.   b)       Observations of the error term are uncorrelated with each other.    c)    No independent variable is a perfect linear function of any other explanatory variables.    d)   The error term has homoscedasticity. e)   All independent variables will be uncorrelated with the error term. ----------------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------------------------------------------------- 2) You test a model that...