Question

Email Subject Word Spam Frequency ‘Free’ ‘Sale’ Y Y Y 20 Y Y N 8 Y...

Email Subject Word

Spam

Frequency

‘Free’

‘Sale’

Y

Y

Y

20

Y

Y

N

8

Y

N

Y

6

Y

N

N

11

N

Y

Y

4

N

Y

N

19

N

N

Y

10

N

N

N

37

Total

115

We wish to create a spam filter. So as a first pass at this task we wish to determine some association rules based on the sample set of email data given in the above table. Based on the above table (provide all answers to 3 significant digits):

  1. What is the probability that an email has both the word ‘Free’ in its subject line and is spam?
  2. What is the probability that an email with the word ‘Free’ in its subject line is spam?
  3. What is the probability that an email without the word ‘Free’ in its subject line is spam?
  4. What is the probability that an email with the word ‘Sale’ in its subject line is spam?
  5. What is the probability that an email without the word ‘Sale’ in its subject line is spam?
  6. (1 mark) What single property (e.g. the word ‘Free’ is in the subject line) is the most likely to indicate that an email is spam (hint: use association rules)?
  7. What complete set of properties (i.e. assigning values over all properties of ‘Free’ and ‘Sale’) is most likely to indicate that an email is spam?
  8. (1 mark) What property (i.e. over all possible single or combination) is the most likely to indicate that an email is spam?

Homework Answers

Answer #1

(a) The probability that an email has both the word 'Free' in its subject line and is spam = P('Free' = Y, Spam = Y) = (20 + 6)/115 = 0.2261

(b) The required probability = P(Spam = Y | 'Free' = Y)

= (20 + 6)/(20 + 8 + 6 + 11) = 26/45 = 0.5778

(c) The required probability = P(Spam = Y | 'Free' = N)

= (4 + 10)/(4 + 19 + 10 + 37) = 14/70 = 0.20

(d)The required probability = P(Spam = Y | 'Sale' = Y)

= (20 + 4)/(20 + 8 + 4 + 19) = 24/51 = 0.4706

(e) The required probability = P(Spam = Y | 'Sale' = N)

= (6 + 10)/(6 + 11 + 10 + 37) = 16/64 = 0.25

(f) The word 'Free' in the subject line is the most likely property to indicate that an email is spam

(g) Both the words 'Free' and 'Sale' in the subject line is the most likely complete set of properties which indicate that an email is spam

(h) The word 'Free' in the subject line is the most likely property to indicate that an email is spam

Know the answer?
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for?
Ask your own homework help question
Similar Questions
In this question, we are giving you the task for trying to prevent cybercrime by performing...
In this question, we are giving you the task for trying to prevent cybercrime by performing some data analytics. You are given the following sample categorized emails where the emails are categorized based on if they have the word ‘Free’ or ‘Sale’ in their headers and if they are actually spam or not (‘Y’ means ‘yes’ the email has the word or is spam, ‘N’ means the email does not have the word or is not spam): Email Subject Word      ...
PART A Your line manager, Ahmed, has sent you the following email late on Wednesday just...
PART A Your line manager, Ahmed, has sent you the following email late on Wednesday just as you are about to finalise your timesheet and head to a monthly tax-update webinar: From: Ahmed Sent: Wednesday, 16 September 2020, 3:58PM Subject: URGENT: Lisa Eastwood meeting scheduled, task assigned Good afternoon, I have just spoken with Lisa Eastwood (new client) over her tax position for the current tax year. I will be getting further documentation tomorrow; however, I need you to examine...
For a C program hangman game: Create the function int setup_game [int setup_game ( Game *g,...
For a C program hangman game: Create the function int setup_game [int setup_game ( Game *g, char wordlist[][MAX_WORD_LENGTH], int numwords)] for a C program hangman game. (The existing code for other functions and the program is below, along with what the function needs to do) What int setup_game needs to do setup_game() does exactly what the name suggests. It sets up a new game of hangman. This means that it picks a random word from the supplied wordlist array and...
Multiple Choice Select the best answer from the available choices for each question. Which of the...
Multiple Choice Select the best answer from the available choices for each question. Which of the following is NOT part of the definition of a sample space S? S can be discrete or continuous Each outcome must be in S at most once Each element in S is equally likely Each outcome must be in S at least once S is a set of possible outcomes in an experiment Three A’s, three B’s, and two C’s are arranged at random...
MATHEMATICS 1. The measure of location which is the most likely to be influenced by extreme...
MATHEMATICS 1. The measure of location which is the most likely to be influenced by extreme values in the data set is the a. range b. median c. mode d. mean 2. If two events are independent, then a. they must be mutually exclusive b. the sum of their probabilities must be equal to one c. their intersection must be zero d. None of these alternatives is correct. any value between 0 to 1 3. Two events, A and B,...
QUESTION 1 Which one of the following would NOT be necessary for an offer to have...
QUESTION 1 Which one of the following would NOT be necessary for an offer to have legal standing? A. The language must reflect the intent to become a party to a contract. B. All of the conditions under which the offer would be terminated must be identified. C. All the significant terms and/or conditions must be contained in the offer. D. The offer must be effectively communicated to the other party. 3 points    QUESTION 2 Which one of the...
Please read the article and answear about questions. Determining the Value of the Business After you...
Please read the article and answear about questions. Determining the Value of the Business After you have completed a thorough and exacting investigation, you need to analyze all the infor- mation you have gathered. This is the time to consult with your business, financial, and legal advis- ers to arrive at an estimate of the value of the business. Outside advisers are impartial and are more likely to see the bad things about the business than are you. You should...
After reading the following article, how would you summarize it? What conclusions can be made about...
After reading the following article, how would you summarize it? What conclusions can be made about Amazon? Case 12: Amazon.com Inc.: Retailing Giant to High-Tech Player? (Internet Companies) Overview Founded by Jeff Bezos, online giant Amazon.com, Inc. (Amazon), was incorporated in the state of Washington in July 1994, and sold its first book in July 1995. In May 1997, Amazon (AMZN) completed its initial public offering and its common stock was listed on the NASDAQ Global Select Market. Amazon quickly...
________ client-centered therapy centers on the patient's goals and ways of solving problems. Select one: a....
________ client-centered therapy centers on the patient's goals and ways of solving problems. Select one: a. Rogers' b. Freud's c. Beck d. Ellis Question 2 Not yet answered Points out of 1.00 Flag question Question text A frequently prescribed drug therapy for managing one's depression is ____________. Select one: a. Adderall b. Lithium c. Prozac d. Thorazine Question 3 Not yet answered Points out of 1.00 Flag question Question text A major goal of modern inpatient psychiatric treatment is: Select...
What role could the governance of ethics have played if it had been in existence in...
What role could the governance of ethics have played if it had been in existence in the organization? Assess the leadership of Enron from an ethical perspective. THE FALL OF ENRON: A STAKEHOLDER FAILURE Once upon a time, there was a gleaming headquarters office tower in Houston, with a giant tilted "£"' in front, slowly revolving in the Texas sun. The Enron Corporation, which once ranked among the top Fortune 500 companies, collapsed in 2001 under a mountain of debt...