Question

Answer the question below: · Mention what is the difference between data mining and data profiling?...

Answer the question below:

· Mention what is the difference between data mining and data profiling?
· Explain what should be done with suspected or missing data?

Homework Answers

Answer #1

Differences between data mining and data profiling :

Data mining is a process of considering the existing database and turning into useful information.

Data profiling is about analyzing the data that is already existing and collecting the statistics about the data. Helps in finding the data quality.it identifies wrong data in the data base and corrects it when necessary.

Data mining evaluates the data base and evaluate patterns in data.

Suspect or missing values handling

Simply not considering the values if our data set is a large one.

Taking median for that column and replacing the median value with that missing values.

If the missing values is between 5%to 10% then we can simply drop that but more than that percentage missing values should be replaced with median or mean of that particular column

Know the answer?
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for?
Ask your own homework help question
Similar Questions
Answer the question below: · Explain what is KNN imputation method? · Mention what are the...
Answer the question below: · Explain what is KNN imputation method? · Mention what are the data validation methods used by data analysts?
Briefly explain the relationship between LSI and SVD (Singular Value Decomposition). (( data mining question ))
Briefly explain the relationship between LSI and SVD (Singular Value Decomposition). (( data mining question ))
What is the difference between the following computing forms? Mention only one difference for each of...
What is the difference between the following computing forms? Mention only one difference for each of the following: a) Grid computing and cluster computing b) Grid computing and cloud computing c) Utility computing and cloud computing
Explain the global energy balance being sure to specifically mention the difference between the solar and...
Explain the global energy balance being sure to specifically mention the difference between the solar and terrestrial electromagnetic radiation emission spectrum. Describe the Greenhouse Effect and explain the difference between the Greenhouse Effect and its related topic of public concern, Global Climate Change. Be as clear and specific as possible.
Answer using complete sentences, write at leat five sentences What is the difference between freeware and...
Answer using complete sentences, write at leat five sentences What is the difference between freeware and open source software? Mention examples of both. Why might a company determine to run its business using freeware? Mention disadvantages in using "open source "free software
Suppose a researcher is interested in answering the following question: Is there a difference between the...
Suppose a researcher is interested in answering the following question: Is there a difference between the mean body temperatures for men and women? He collects data on body temperature and gender from a random sample of 130 men and women. The data collected is shown in the table below. Calculate a 95% confidence interval to answer the researcher’s question. Mean Standard Deviation Sample Size Males 98.105 0.699 65 Females 98.394 0.743 65 Based on your confidence interval, is there a...
Please answer the following question in a few sentences: What is the difference between valid arguments...
Please answer the following question in a few sentences: What is the difference between valid arguments and strong arguments?
Suppose a researcher is interested in answering the following question: Is there a difference between the...
Suppose a researcher is interested in answering the following question: Is there a difference between the mean body temperatures for men and women? He collects data on body temperature and gender from a random sample of 130 men and women. The data collected is shown in the table below. Calculate a 95% confidence interval to answer the researcher’s question. Mean Standard Deviation Sample Size Males 98.105 0.699 65 Females 98.394 0.743 65 Are the criteria for the t-distribution met? Explain...
Part1: What is data mining explain the advantages and disadvantages in detail?
Part1: What is data mining explain the advantages and disadvantages in detail?
In order to study the relationship between uranium mining and lung cancer in West Virginia, 3,500...
In order to study the relationship between uranium mining and lung cancer in West Virginia, 3,500 people with incident lung cancer were identified through cancer registries. 4,300 people without lung cancer were selected from the DMV registries. Uranium mining data was collected through mailed questionnaires. Uranium mining was reported by 925 of the people with lung cancer and 955 of those without lung cancer respectively, while the others reported no uranium mining. What study design was employed? Mention in short...
ADVERTISEMENT
Need Online Homework Help?

Get Answers For Free
Most questions answered within 1 hours.

Ask a Question
ADVERTISEMENT