In this question, we are giving you the task for trying to prevent cybercrime by performing some data analytics.
You are given the following sample categorized emails where the emails are categorized based on if they have the word ‘Free’ or ‘Sale’ in their headers and if they are actually spam or not (‘Y’ means ‘yes’ the email has the word or is spam, ‘N’ means the email does not have the word or is not spam):
Email Subject Word Spam Frequency
‘Free’ ‘Sale’
Y Y Y 11
Y Y N 15
Y N Y 3
Y N N 19
N Y Y 25
N Y N 4
N N Y 41
N N N 7
Total 125
e. What is the probability that an email without the word ‘Sale’ in its subject line is spam?
f. (1 mark) What single property (e.g. the word ‘Free’ is in the subject line) is the most likely to indicate that an email is spam (hint: use association rules)?
g. What complete set of properties (i.e. assigning values over all properties of
‘Free’ and ‘Sale’) is most likely to indicate that an email is spam?
h. (1 mark) What property (i.e. over all possible single or combination) is the most likely to indicate that an email is spam?
Get Answers For Free
Most questions answered within 1 hours.