There is a dataset with missing values that are missing not at random (MNAR), and the probability of missing is related to the values themselves.
Regarding this, what would happen when imputing the missing values with the mean strategy?
In Mean imputation technique goal is to replace missing data with statistical estimates of the missing values.
In a mean substitution, the mean value of a variable is used in place of the missing data value for that same variable. This has the benefit of not changing the sample mean for that variable. The theoretical background of the mean substitution is that the mean is a reasonable estimate for a randomly selected observation from a normal distribution. However, with missing values that are not strictly random, especially in the presence of a great inequality in the number of missing values for the different variables, the mean substitution method may lead to inconsistent bias. Distortion of original variance and Distortion of co-variance with remaining variables within the dataset are two major drawbacks of this method.
Get Answers For Free
Most questions answered within 1 hours.