Question

# What are the goals of data screening? How can you identify and remedy the following? Errors...

What are the goals of data screening? How can you identify and remedy the following? Errors in data entry. Outliers. Missing data.

Goals of data screening are as follows.

• Accuracy of data entry- We have to cross check weather we have entered the data correctly or made any typing error, collection error or something like these.
• Dealing with missing data- We have to notice those data which are missed in our collected data. Based on that we have to analysis whether that number and effect are significant or not. If not significant, we can proceed for further computations. Otherwise we have to make arrangements to collect or estimate those data (all or partially as possible and as required) using different approaches.
• Handling outliers- Through overall view of the gathered data, we have to notice if there is any outlier and if possible we have to crosscheck those. If crosschecking is not possible, we have to assess the effect of those outliers in overall data and if required, we have to neglect those data for further computations.
• Test of assumptions- Earlier made assumptions like normality, linearity, uniformity, symmetricity and others are to be checked while data screening is performed.

Errors in data entry-

We have to crosscheck data after entering and thus errors in data entry can be avoided (or reduced). Further observing any outlier value, we have to take special care to crosscheck whether those are entered correctly or not.

Outliers-

Outliers in a set of data can be identified by mere observation of the data values or through plots like scatter plot or histogram and so others. For those we have to check whether

• these occurred due to data entry error
• these are cases which are not at all part of the population
• these are the real cases which are practically different from others

For outliers we have to analysis its leverage, discrepancy and influence on the data set.

Missing data-

We have to note the missing data and check whether missing data is random or not. Creating two groups one with missing data and other without missing data we have to perform t-test to examine whether there is any difference between groups. If difference is significant we have to proceed through any of following processes.

• Cases or variables related to missing data may be deleted.
• Missing values may be estimated during analysis. Replacements can be done using prior knowledge or by replacement of estimated mean (which does not change mean but reduces the standard deviation).
• Estimating using regression approach (though it is time consuming).

After reconstructing the data set we have to again perform analysis.

#### Earn Coins

Coins can be redeemed for fabulous gifts.

##### Need Online Homework Help?

Most questions answered within 1 hours.