Question

Consider a data set of two attributes A and B. A is continuous, whereas B is...

Consider a data set of two attributes A and B. A is continuous, whereas B is categorical, having two values as “y” and “n”, which can be considered as class of each observation. When attribute A is discretized into two equiwidth intervals no information is provided by the class attribute B but when discretized into three equiwidth intervals there is perfect information provided by B. Construct a simple dataset obeying these characteristics

Homework Answers

Answer #1

Let's take the following table as an example.

A B
0.0 y
1.14 y
2.3 n
2.78 n
3.48 n
3.9 n
5.5 y
6 y

If we divide A into 2 equal intervals (i.e. 0-3, and 3-6), then both the intervals of A has two 'y' and two 'n' of B. Consequently, B cannot identify the category of A.

But if we divide A into 3 equal intervals (i.e. 0-2, 2-4 and 4-6), then the first and third interval of A contains only 'y' values of B and the middle interval contains all the 'n' values. Therefore, knowledge on B gives us complete information about the category of A.

Know the answer?
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for?
Ask your own homework help question
Similar Questions
1. Which of the following is NOT a type of data anomaly? a. Insertion b. Transformation...
1. Which of the following is NOT a type of data anomaly? a. Insertion b. Transformation c. Deletion d. Modification 2. Functional dependency has the following characteristics EXCEPT a. Attribute Y is functionally dependent upon attribute X, if the value of X uniquely determines the value of Y b. It's a constraint between two tables c. It's a constraint between two attributes (columns) d. Represented as Determinant(s) -> Dependent(s) 3. The definition of a partial dependency is a.One or more...
You want to estimate E[x] and you have two datasets. In the first data set, there...
You want to estimate E[x] and you have two datasets. In the first data set, there are 10 observations. In the second data set there are 100 observations, but these observations involve measurement errors. That is, X tilda = X + e, where e is a classical measurement error. You can only use one dataset, and you want to have a precise estimate in the sense of having a low variance. A) You should always use the first data set...
Consider a data set where the objects are images from a weather satellite and each image...
Consider a data set where the objects are images from a weather satellite and each image consists of one million pixels. (Assume that each pixel consists of a real value representing the brightness. Also, assume that the images are snapshots of different areas and do not represent images of the same area at successive intervals in time.) The data can be represented as record data, where each image is a record (object) and each pixel is an attribute. a. (10...
For my Healthcare Statistics class....Consider two different data sets and then for each data set, propose...
For my Healthcare Statistics class....Consider two different data sets and then for each data set, propose your idea of what graph would best represent the key information. For each, be sure to include the type of graph along with what would be shown on each axis. The key data in the first data set is a list of all the injuries that a clinic saw in a month. The other data set has key data on the number of minutes...
Consider the following set of sample data, and find the following (note: if you are using...
Consider the following set of sample data, and find the following (note: if you are using TI 83 to find these values, you need to write down the formulas that you should have used) 19 12 9 19 0 4 3 14 5 5 24 2 5 7 17 (a). Mean (b). Median (c). Mode (d). Standard Deviation (e). Variance (f). Find the z-score for 14. (i). Construct a Box-whisker plot for the data (j). Construct a frequency distribution Table....
2. Consider the data set has four variables which are Y, X1, X2 and X3. Construct...
2. Consider the data set has four variables which are Y, X1, X2 and X3. Construct a multiple regression model using Y as response variable and other X variables as explanatory variables. (a) Write mathematics formulas (including the assumptions) and give R commands to obtain linear regression models for Y Xi, i =1, 2 and 3. (b) Write several lines of R commands to obtain correlations between Xi and Xj , i 6= j and i, j = 1, 2,...
4. Consider a data set of 1000 annual incomes (in thousand of dollars). Suppose two zeros...
4. Consider a data set of 1000 annual incomes (in thousand of dollars). Suppose two zeros were mistakenly added to one of the incomes, making it 100 times the correct amount. How would this affect the mean and median? Underline the correct choice: Question: b. The [mean, median] would be changed very little or not at all, depending on whether the changed observation was below or above the [mean, median]. That is, the [mean, median] is robust.
Consider the following set of data. (24, 11), (33, 45), (57, 26), (79, 18), (107, 58),...
Consider the following set of data. (24, 11), (33, 45), (57, 26), (79, 18), (107, 58), (116, 6) (a) Calculate the covariance of the set of data. (Give your answer correct to two decimal places.) (b) Calculate the standard deviation of the six x-values and the standard deviation of the six y-values. (Give your answers correct to three decimal places.) sx = sy = (c) Calculate r, the coefficient of linear correlation, for the data in part (a). (Give your...
Construct a scattergram for each data set. Then calculate r and r2for each data set. Interpret...
Construct a scattergram for each data set. Then calculate r and r2for each data set. Interpret their values. Complete parts a through d. a. x−1,0,1,2,3 y−3,0,1,4,5 Calculate r. r=.9853 ​(Round to four decimal places as​ needed.)Calculate r2.=0.9709 ​(Round to four decimal places as​ needed.) Interpret r. Choose the correct answer below. A. There is not enough information to answer this question. B. There is a very strong negative linear relationship between x and y. C.x and y are not related....
WILL LIKE POST!!!!! 2. A continuous uniform random variable defined between 0 and 12 has a...
WILL LIKE POST!!!!! 2. A continuous uniform random variable defined between 0 and 12 has a variance of: Select one: a. 12 b. 24 c. 144 d. 6 5.A probability plot shows: Select one: a. Percentile values and best fit distribution. b. Percentile values of a proposed distribution and the sample percentages. c. Percentile values of a proposed distribution and the corresponding measurements. d. Sample percentages and percentile values. 7. Consider a joint probability function for discrete random variables X...
ADVERTISEMENT
Need Online Homework Help?

Get Answers For Free
Most questions answered within 1 hours.

Ask a Question
ADVERTISEMENT