Question

There is a dataset with missing values that are missing not at random (MNAR), and the...

There is a dataset with missing values that are missing not at random (MNAR), and the probability of missing is related to the values themselves.

Regarding this, what would happen when imputing the missing values with the mean strategy?

Homework Answers

Answer #1

In Mean imputation technique goal is to replace missing data with statistical estimates of the missing values.

In a mean substitution, the mean value of a variable is used in place of the missing data value for that same variable. This has the benefit of not changing the sample mean for that variable. The theoretical background of the mean substitution is that the mean is a reasonable estimate for a randomly selected observation from a normal distribution. However, with missing values that are not strictly random, especially in the presence of a great inequality in the number of missing values for the different variables, the mean substitution method may lead to inconsistent bias. Distortion of original variance and Distortion of co-variance with remaining variables within the dataset are two major drawbacks of this method.

Know the answer?
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for?
Ask your own homework help question
Similar Questions
Should I impute missing values before clustering or not? Impute missing values will cause bias, but...
Should I impute missing values before clustering or not? Impute missing values will cause bias, but with too many missing values ,the analysis will be deficient. What the right choice? Thanks!
Consider a dataset with at least three values. Suppose the highest value is increased by 5...
Consider a dataset with at least three values. Suppose the highest value is increased by 5 and the lowest is decreased by 5. Which of the following measure(s) would change? Median Mean Mean and median Standard deviation
Consider a dataset with at least three values. Suppose the highest value is increased by 5...
Consider a dataset with at least three values. Suppose the highest value is increased by 5 and the lowest is decreased by 2. Which of the following measure(s) would not change? a. Mean b. Standard deviation c. Median d. a and b
P(Employed)= 4,125,864. P(Unemployed)= 43,564. If I were to select a random person from the dataset, what...
P(Employed)= 4,125,864. P(Unemployed)= 43,564. If I were to select a random person from the dataset, what is the probability that person is either employed or unemployed?
Note: All of the data sets associated with these questions are missing, but the questions themselves...
Note: All of the data sets associated with these questions are missing, but the questions themselves are included here for reference. Large Data Set 1 records the SAT scores of 1,000 students. Regarding it as a random sample of all high school students, use it to test the hypothesis that the population mean exceeds 1,510, at the 1% level of significance. (The null hypothesis is that μ = 1510.) answer:  H0:μ=1510H0:μ=1510 vs. Ha:μ>1510.Ha:μ>1510. Test Statistic: Z = 2.7882. Rejection Region: [2.33,∞).[2.33,∞)....
When computing the outliers of a given dataset, we could find Q1-1.5*IQR and Q3+1.5*IQR, and any...
When computing the outliers of a given dataset, we could find Q1-1.5*IQR and Q3+1.5*IQR, and any values outside the range is considered as outliers. What if we just simply remove the smallest and largest values from the dataset? If there are more than one smallest/largest values, just remove one of them. Would this be valid?
use the standard normal distribution table to determine the missing values of the following probability. P(0≤Z≤?)=0.4884
use the standard normal distribution table to determine the missing values of the following probability. P(0≤Z≤?)=0.4884
2.   A partially completed ANOVA table is shown below. Fill in the missing 7 values. F-Table...
2.   A partially completed ANOVA table is shown below. Fill in the missing 7 values. F-Table Sum of Squares DF Mean Square F-Ratio p-value Method 2 500.00 Error 27 200.00 Total
1. Assume the random variable X represents the concentration of bacteria in a contaminated liquid, and...
1. Assume the random variable X represents the concentration of bacteria in a contaminated liquid, and that X is normally distributed with a mean µ=250 ppm and a standard deviation σ=70 ppm. Compute the approximate probability that Danielle would randomly select a container with a reading greater than 140 ppm. Now, suppose that the value of the standard deviation was smaller, 40 ppm instead of 70 ppm. What would this tell us about the bacteria concentration in randomly selected container?...
Refer to the table and fill in the values missing in the sentences below. Hourly Wage...
Refer to the table and fill in the values missing in the sentences below. Hourly Wage Quantity Workers Demanded Quantity Workers Supplied $14 12,000 6,000 $16 10,000 7,000 $18 8,000 8,000 $20 6,000 9,000 $22 4,000 10,000 $24 2,000 11,000 With no union, the equilibrium wage rate would be $   per hour and there would be  employees. If the union has enough power to raise the wage to $4 higher than under the original equilibrium, the new wage would be $...
ADVERTISEMENT
Need Online Homework Help?

Get Answers For Free
Most questions answered within 1 hours.

Ask a Question
ADVERTISEMENT