Consider a data set of two attributes A and B. A is continuous, whereas B is categorical, having two values as “y” and “n”, which can be considered as class of each observation. When attribute A is discretized into two equiwidth intervals no information is provided by the class attribute B but when discretized into three equiwidth intervals there is perfect information provided by B. Construct a simple dataset obeying these characteristics
Let's take the following table as an example.
A | B |
0.0 | y |
1.14 | y |
2.3 | n |
2.78 | n |
3.48 | n |
3.9 | n |
5.5 | y |
6 | y |
If we divide A into 2 equal intervals (i.e. 0-3, and 3-6), then both the intervals of A has two 'y' and two 'n' of B. Consequently, B cannot identify the category of A.
But if we divide A into 3 equal intervals (i.e. 0-2, 2-4 and 4-6), then the first and third interval of A contains only 'y' values of B and the middle interval contains all the 'n' values. Therefore, knowledge on B gives us complete information about the category of A.
Get Answers For Free
Most questions answered within 1 hours.