How do data analysts measure the effect of an outlier on results? Once the analyst can measure their effects, what are the benefits of removing the outliers? Provide specific examples to illustrate your ideas.
An outlier is a value that is very different from the other data in your data set. This can skew your results.
Let's examine what can happen to a data set with outliers. For the sample data set:
1, 1, 2, 2, 2, 2, 3, 3, 3, 4, 4
We find the following mean, median, mode, and standard deviation:
Mean = 2.58
Median = 2.5
Mode = 2
Standard Deviation = 1.08
If we add an outlier to the data set:
1, 1, 2, 2, 2, 2, 3, 3, 3, 4, 4, 400
The new values of our statistics are:
Mean = 35.38
Median = 2.5
Mode = 2
Standard Deviation = 114.74
As you can see, having outliers often has a significant effect on your mean and standard deviation. Because of this, we must take steps to remove outliers from our data sets.
Get Answers For Free
Most questions answered within 1 hours.