When computing the outliers of a given dataset, we could find Q1-1.5*IQR and Q3+1.5*IQR, and any values outside the range is considered as outliers.
What if we just simply remove the smallest and largest values from the dataset? If there are more than one smallest/largest values, just remove one of them. Would this be valid?
No, simply removing the smallest and largest values from the data set is not a solution to outliers because first we need to check what is the range for lower and higher end of data values, which we can accept. After finding the range, we can easily determine whether to remove the smallest data only, largest data value only or both data values.
If there are more than one smallest/largest values, just removing one of them will not be valid because an outlier is always affecting the mean of the data values, so it is better to remove all the outlier rather than just one of them.
Get Answers For Free
Most questions answered within 1 hours.