Question

Question: How would you scan for outliers in your dataset? What would you do with data...

Question: How would you scan for outliers in your dataset? What would you do with data points that are considered outliers?

Homework Answers

Answer #1

To identify outliers there are many methods--
a) Suppose you have single data set, then draw box plot and use    criteria to detect outliers
b) We have data on two variable x and y, then simple draw scatter plot.The value which appears most extreme is considered as outliers.

c) We can also use Z -score to detect outlier .

What to do with outliers:

a) Drop the outliers from the data set
b) Cap your outlier data
c) We can assign a new value to outlier within the data set.
d) Try some transformation.

( Please give thumps up, if you like my answer )

Know the answer?
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for?
Ask your own homework help question
Similar Questions
How do I upload the Dataset in order for you to answer the following question? Q4c:...
How do I upload the Dataset in order for you to answer the following question? Q4c: If you wanted to make a graphical display of the hrs1 and compuse variables, what would that display be?
When computing the outliers of a given dataset, we could find Q1-1.5*IQR and Q3+1.5*IQR, and any...
When computing the outliers of a given dataset, we could find Q1-1.5*IQR and Q3+1.5*IQR, and any values outside the range is considered as outliers. What if we just simply remove the smallest and largest values from the dataset? If there are more than one smallest/largest values, just remove one of them. Would this be valid?
You have five quantitative variables and two categorical variables in a dataset. You would like to...
You have five quantitative variables and two categorical variables in a dataset. You would like to examine the pairwise relationship between all variables in the dataset. A) What kind of plots or tables are you going to use and why? B) How many unique correlation coefficients r are you expecting to calculate? Show your reasoning/calculations. (harder question)
In R dataset "Airquality" What are the data types used in airquality dataset, and how many...
In R dataset "Airquality" What are the data types used in airquality dataset, and how many variables are there? Find out min, Q1, Median, Mean, Q3 and Max in Wind column not using NAs. Draw a scatter plot of Solara Radiation (Solar.R) with respect to each day
Are confidence interval limits sensitive to outliers? HOW should you handle ANY outliers when they are...
Are confidence interval limits sensitive to outliers? HOW should you handle ANY outliers when they are found in sample data sets that will be used for the construction of confidence intervals?
I am aware the data is not given. HOW DO I UPLOAD THE DATASET IN ORDER...
I am aware the data is not given. HOW DO I UPLOAD THE DATASET IN ORDER FOR YOU TO ANSWER THE FOLLOWING QUESTION: I have chosen three variables from the GSS2014sample.sav SPSS data set for you to use to create two bivariate comparisons. The variable names and labels are listed below. Identify the correct level of measurement for each one. Complete the table, list the categories in the frequency table for each variable, then continue to the questions that follow....
What do you use or what do you do within your current organization that might be...
What do you use or what do you do within your current organization that might be considered working with Big Data? Explain.
You have five quantitative variables and two categorical variables in a dataset. You would like to...
You have five quantitative variables and two categorical variables in a dataset. You would like to examine the pairwise relationship between all variables in the dataset. A) What kind of plots or tables are you going to use and why? B) How many unique correlation coefficients r are you expecting to calculate? Show your reasoning/calculations.
STATISTICS: Use the data in Bank Dataset (from Unit 1 Exercise Question 2) to answer this...
STATISTICS: Use the data in Bank Dataset (from Unit 1 Exercise Question 2) to answer this question. a. Construct a 95% confidence interval to estimate the proportion of customers who would recommend the Bank to family and friends after the change. b. What is the margin of error of the estimate at the 95% confidence level? DATASET HAS 132/152 WHO RECOMMENDED THE BANK AFTER CHANGE AND 20/152 WHO DID NOT RECOMMEND THE BANK AFTER CHANGE.
What percentage of data would you predict would be between 40 and 70 and what percentage...
What percentage of data would you predict would be between 40 and 70 and what percentage would you predict would be more than 70 miles?   find the percentage of the data set we expect to have values between 40 and 70 as well as for more than 70. Now determine the percentage of data points in the dataset that fall within this range, using same strategy as above for counting data points in the data set. How do each of...
ADVERTISEMENT
Need Online Homework Help?

Get Answers For Free
Most questions answered within 1 hours.

Ask a Question
ADVERTISEMENT