Q6) (a) In the data pre-processing stage, how would you analyze almost-unary columns (i.e. columns with almost only one value)? (b) How might this column be generated? (c) Does this type of variable have significant value in data mining?
Answer 6:
a) To evaluate almost-unary columns, it should
first have the same meaning if it is near unary. Secondly, there
may be few other records that form an insignificant part of the
results.
(b)The negligible component of the data is a
category of incredibly tiny data when it is always too tiny to be
important because the data mining algorithms perfectly defined it.
The thumb rule: if 95-99 percent of the column values are the same,
the column is likely to be pointless to be overlooked.
c) In addition, it is an essential variable in
data mining, which will lead one to consider why these values are
highly distorted and to find ways to cope with these issues.
Get Answers For Free
Most questions answered within 1 hours.