I am working with data analysis and with a survey. I am cleaning up the following variables, how do I handle the negative 1's?
Do I delete them get the mean and then replace the -1s with the mean?
Do I not delete the -1s, get the mean and then replace -1s with the mean (that included the -1s).
Do I delete them completely?
My professor said something about either deleting them altogether or if neg 1s were under 10% to replace the negative 1s with the mean--didn't really understand this.
Research question is:
Is there a correlation between how often someone uses the internet and | ||||||
their concern for privacy. (consumers have lost control over how personal information is collected and used by companies) QpRi2;Q9a this is part of the dataset: |
Use_Internet | Concern_Privacy |
-1 | -1 |
1 | 2 |
-1 | 1 |
1 | 1 |
1 | 2 |
-1 | 1 |
1 | 1 |
2 | 1 |
1 | 2 |
1 | 1 |
The professor meant that you have two ways to go about it which are:
1. If there are a significant number of negative ones (more than 10%), it would be better if you delete those data points, otherwise if you do imputations on these data points, the results would get affected significantly.
2. If there are not a significant number of negative ones (less than 10%), we can do the mean imputation on these data points. So, take the mean of the column (do not include the negative ones, we never include the bad values) and replace the negative ones with the mean computed for the corresponding column. Example- If you are replacing the -1 in Use_Internet, find the mean value of that column without including the negative ones and replace all the data points having -1 with the mean value. Hence, in this way the data will be cleaned up.
Get Answers For Free
Most questions answered within 1 hours.