A survey is divided into three parts:
Part 1 includes socioeconomic status (SES) information; parent’s education, employment status (currently employed yes or no), income level, and receiving free or reduced lunch at school.
In part 2, the questions are regarding the environment at home, time spent watching (playing video games) TV at home, having a TV in their rooms, computer available to do assignments at home, Internet access at home, etc.
part 3, includes questions related to the safety of the neighborhood environment: some examples include if the students walk to school, how safe they feel while walking to and from school, do you feel safe at school? Are there a lot of fights at school? etc.
The purpose of you collecting these data is to create a model that can help to predict academic achievement in high school students using some of these variables.
Question: What considerations must you make to determine which variables should be included in the model? Include the concept of multicollinearity.
Aim : fit a model to predict academic achievement in high school students
Enumerated variables : Socio economic status, Environment at home , Safety at neighborhood environment. Each three categories has different variables
Answer
Consideration : Please don't choose too many variables. Only choose the relevant variables which are useful. This can be done by finding out coefficient of determination. Coefficient of determination will help you to identify which variables are essential in the model. Next is the problem arisen by the Multicollinearity. Multicollinearity means near linear dependent of explanatory variables which affects our Ordinary Least Square Estimates. So you should ensure that the choosen variables are free from Multicollinearity. Multicollinearity can be diagnosed using methods like Variance Inflation Factor (VIF).
Get Answers For Free
Most questions answered within 1 hours.