Email Subject Word |
Spam |
Frequency |
|
‘Free’ |
‘Sale’ |
||
Y |
Y |
Y |
20 |
Y |
Y |
N |
8 |
Y |
N |
Y |
6 |
Y |
N |
N |
11 |
N |
Y |
Y |
4 |
N |
Y |
N |
19 |
N |
N |
Y |
10 |
N |
N |
N |
37 |
Total |
115 |
We wish to create a spam filter. So as a first pass at this task we wish to determine some association rules based on the sample set of email data given in the above table. Based on the above table (provide all answers to 3 significant digits):
(a) The probability that an email has both the word 'Free' in its subject line and is spam = P('Free' = Y, Spam = Y) = (20 + 6)/115 = 0.2261
(b) The required probability = P(Spam = Y | 'Free' = Y)
= (20 + 6)/(20 + 8 + 6 + 11) = 26/45 = 0.5778
(c) The required probability = P(Spam = Y | 'Free' = N)
= (4 + 10)/(4 + 19 + 10 + 37) = 14/70 = 0.20
(d)The required probability = P(Spam = Y | 'Sale' = Y)
= (20 + 4)/(20 + 8 + 4 + 19) = 24/51 = 0.4706
(e) The required probability = P(Spam = Y | 'Sale' = N)
= (6 + 10)/(6 + 11 + 10 + 37) = 16/64 = 0.25
(f) The word 'Free' in the subject line is the most likely property to indicate that an email is spam
(g) Both the words 'Free' and 'Sale' in the subject line is the most likely complete set of properties which indicate that an email is spam
(h) The word 'Free' in the subject line is the most likely property to indicate that an email is spam
Get Answers For Free
Most questions answered within 1 hours.