Question

why should we remove non-significant variables from multiple linear regression models? What problems may arise if...

why should we remove non-significant variables from multiple linear regression models? What problems may arise if we keep them in the model?

Homework Answers

Answer #1

The data are incapable of really telling you which model is "better" unless you use AIC in a highly structured way (e.g. on a pre-specified large group of variables), and removing insignificant variables invalidates the estimate of sigma² and all P-values, standard errors, and confidence limits in addition to invalidating the formula for adjusted R²

The first is prediction accuracy: keeping all variables often have low bias but large variance. Prediction accuracy can sometimes be improved by shrinking or setting some coefficients to zero. By doing so we sacrifice a little bit of bias to reduce the variance of the predicted values, and hence may improve the overall prediction accuracy.

The second reason is interpretation. With a large number of predictors, we often would like to determine a smaller subset that exhibit the strongest effects.

Know the answer?
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for?
Ask your own homework help question
Similar Questions
With a multi-variable linear regression model how can we decide which independent variables to remove from...
With a multi-variable linear regression model how can we decide which independent variables to remove from the model?
Multiple Linear Regression We consider the misspecification problem in multiple linear regression. Suppose that the following...
Multiple Linear Regression We consider the misspecification problem in multiple linear regression. Suppose that the following model is adopted y = X1β1 + ε while the true model is y = X1β1 + X2β2 + ε. For both models, we assume E(ε) = 0 and V (ε) = σ^2I. Figure out conditions under which the least squares estimate we obtained is unbiased.
In the multiple linear regression model with estimation by ordinary least squares, why should we make...
In the multiple linear regression model with estimation by ordinary least squares, why should we make an analysis of the scatter plot between each covariable xij, j = 1, 2,. . . ,p with the residues ei?
What is logistic regression? When should we use logistic regression instead of linear regression? Why cannot...
What is logistic regression? When should we use logistic regression instead of linear regression? Why cannot we use linear regression where logistic regression is used? Provide the generalized multiple logistic regression equation.
Why are linear regression models limited in its power to project into the future? Why should...
Why are linear regression models limited in its power to project into the future? Why should one be very careful in using linear regressions to make projections?
With multiple regression, the main focus is on variables that are significant within the model and...
With multiple regression, the main focus is on variables that are significant within the model and contribute to the variation occurring on the dependent variable. When multiple variances within the model are insignificant, then the reliability of the model is reduced. Therefore, we can not depend on the model for future reference. In this analysis, the dependent variable is ethical behavior that can be determined by the course taken, age, gender, and personality character of an individual. This model can...
What are some of the violations of the linearity assumption in the multiple linear regression model...
What are some of the violations of the linearity assumption in the multiple linear regression model and how can we correct those violations? Mainly focus on how to correct them by stating the violations.
In any regression model, p denotes the number of explanatory variables in the model. In simple...
In any regression model, p denotes the number of explanatory variables in the model. In simple linear regression (SLR), p=1. True/False? When testing whether the slope of a explanatory variable is 0 or not in context of multiple regression, what distribution is used to determine the p-value? standard normal distribution / t distribution with n−1 degrees of freedom / t distribution with n−2 degrees of freedom / t distribution with n−p−1 degrees of freedom ? In multiple regression, there is...
when doing a football multiple linear regression, what variables are necessary to include when trying to...
when doing a football multiple linear regression, what variables are necessary to include when trying to looking at the winning percentages of the teams?
1. Consider an actual proof problem 2. The problem should estimate one multiple linear regression, or...
1. Consider an actual proof problem 2. The problem should estimate one multiple linear regression, or a logistic regression. Explain the estimation. 3. Consider a different model from step 2. You can try adding other random variables to explain your model, or consider different ways to explain (For example: probit model, lasso, trees, random forest, gradient boosting, or neural net). Then, compare the estimation results of the original model and the new model. ps. If you answer the question with...