Consider a data set where the objects are images from a weather satellite and each image consists of one million pixels. (Assume that each pixel consists of a real value representing the brightness. Also, assume that the images are snapshots of different areas and do not represent images of the same area at successive intervals in time.) The data can be represented as record data, where each image is a record (object) and each pixel is an attribute.
a. (10 pts) What are three techniques for handling missing values?
b. (10 pts) An image often has missing values for scattered pixels. (A pixel is missing, but those around it are not.) Which of the three techniques would be the most appropriate for this situation and why?
a)
(i) Replace the missing values with IMPUTATION.
(ii) Replace missing values with an INTERPOLATED ESTIMATE.
(iii) Replace missing values with the MEAN.
b)
REPLACING MISSING VALUES WITH IMPUTATION- Imputation is a way of using features to model each other. That way, when one is missing, the others can be used to fill the blanks in a reasonable way. This is very powerful method when features are related.
REPLACING MISSING VALUES WITH AN INTERPOLATED ESTIMATE- Interpolation is a statistical method by which related known values are used to estimate an unknown values. Interpolation is a method of estimating an values.
REPLACING MISSING VALUES WITH MEAN- In this case,replacing values that represent the existing distribution, such as the mean, is a reasonable approach.
Get Answers For Free
Most questions answered within 1 hours.