The following training dataset is “reading email dataset”. This dataset has four features as follows: author,...

Question

Question

The following training dataset is “reading email dataset”. This dataset has four features as follows: author,...

The following training dataset is “reading email dataset”.

This dataset has four features as follows: author, thread, length, and where to read the mail. According to the features the algorithm has to predict the user’s action whether to read or skip the mail.

Use Naïve Bayes classifier to predict the user’s action (skips or reads) when the author of the mail is known, the thread of the mail is follow up, the length of the mail is short, and where to read the email is home.

Author	Thread	Length	Where to read	User’s Action
Known	new	long	home	Skips
unknown	new	short	work	Reads
unknown	Follow up	long	work	Skips
Known	Follow up	Long	Home	Skips
Known	New	Short	Home	Reads
Known	Follow up	Long	Work	Skips
Unknown	New	short	work	skips
Unknown	New	short	Work	reads
Known	Follow up	Long	Home	Skips
known	New	Long	Work	skips
unknown	Follow up	short	home	Skips
Known	new	Long	work	Skips
Known	Follow up	Short	Home	Reads
Known	New	Short	Work	Reads
known	New	short	Home	Reads
Known	Follow up	short	Work	Reads
Known	New	Short	home	Reads
unknown	new	short	work	Reads

Write a Python code to implement a naïve Bayesian classifier to predict the user’s action (skips or reads) when the author of the mail is known, the thread of the mail is follow up, the length of the mail is short, and where to read the email is home. (Do not use Scikit-Learn)
Use Scikit-Learn to predict the user’s action (skips or reads) when the author of the mail is known, the thread of the mail is follow up, the length of the mail is short, and where to read the email is home.

Hint in authors feature you can use 0, 1 instead of unknown and known. In thread feature you can use 0, 1 instead of follow up and new. In length feature you can use 0, 1 instead of short and long. In where to read feature you can use 0, 1 instead of home, work. In the target you can use 0 instead of skips and 1 instead of reads.

Two different programs, 1 doesnt use scikit and 2 does use scikit-learn

Engineering Computer-Science

0 0

Add a comment Transcribed image text

Answer 1

Answer #1

without scikit

import pandas as pd
import matplotlib.pyplot as plt

import numpy as np
     

def accuracy_score(pre, y):
    return 1 - sum(1.0 * (pre - y)**2)/len(y)


class GaussianNB:
    
    def fit(self, X, y, epsilon = 1e-10):
        self.y_classes, y_counts = np.unique(y, return_counts=True)
        self.x_classes = np.array([np.unique(x) for x in X.T])
        self.phi_y = 1.0 * y_counts/y_counts.sum()
        self.u = np.array([X[y==k].mean(axis=0) for k in self.y_classes])
        self.var_x = np.array([X[y==k].var(axis=0)  + epsilon for k in self.y_classes])
        return self
    
    def predict(self, X):
        return np.apply_along_axis(lambda x: self.compute_probs(x), 1, X)
    
    def compute_probs(self, x):
        probs = np.array([self.compute_prob(x, y) for y in range(len(self.y_classes))])
        return self.y_classes[np.argmax(probs)]
    
    def compute_prob(self, x, y):
        c = 1.0 /np.sqrt(2.0 * np.pi * (self.var_x[y]))
        return np.prod(c * np.exp(-1.0 * np.square(x - self.u[y]) / (2.0 * self.var_x[y])))
    
    def evaluate(self, X, y):
        return (self.predict(X) == y).mean()


pop = pd.read_csv('./Downloads/convertcsv.csv',dtype='category')
#print(pop)
pop = pop.apply(lambda x: x.astype(str).str.lower())
pop.columns = pop.columns.str.replace(' ', '')
#print(pop)

pop['Author']=pd.factorize(pop.Author)[0]
pop['Thread']=pd.factorize(pop.Thread)[0]
pop['Length']=pd.factorize(pop.Length)[0]
pop['Wheretoread']=pd.factorize(pop.Wheretoread)[0]
pop['UsersAction']=pd.factorize(pop.UsersAction)[0]
#print(pop)

##naive bayes

X=pop[['Author','Thread','Length','Wheretoread']].to_numpy()
print(X)
Y=pop[['UsersAction']].to_numpy().flatten()
#print(Y)
 
clf=GaussianNB().fit(X,Y)
print(GaussianNB().fit(X, Y).evaluate(X, Y))

Xtest=[[0,1,1,0]];

print(clf.predict(Xtest))

with scikit

import pandas as pd
import matplotlib.pyplot as plt

import numpy as np
     


pop = pd.read_csv('./Downloads/convertcsv.csv',dtype='category')
#print(pop)
pop = pop.apply(lambda x: x.astype(str).str.lower())
pop.columns = pop.columns.str.replace(' ', '')
#print(pop)

pop['Author']=pd.factorize(pop.Author)[0]
pop['Thread']=pd.factorize(pop.Thread)[0]
pop['Length']=pd.factorize(pop.Length)[0]
pop['Wheretoread']=pd.factorize(pop.Wheretoread)[0]
pop['UsersAction']=pd.factorize(pop.UsersAction)[0]
#print(pop)

##naive bayes

X=pop[['Author','Thread','Length','Wheretoread']].to_numpy()
print(X)
Y=pop[['UsersAction']].to_numpy().flatten()
#print(Y)
 
from sklearn.naive_bayes import GaussianNB

clf=GaussianNB().fit(X,Y)
GaussianNB().fit(X, Y)

Xtest=[[0,1,1,0]];

print(clf.predict(Xtest))

0 0

Add a comment

The following training dataset is “reading email dataset”. This dataset has four features as follows: author,...

Homework Answers

Post as a guest

Earn Coins

Not the answer you're looking for?

Similar Questions

Please read article, Business doesn't happen face to face as often as some would like. Instead,...

Japan-Test Market for the World The following mini case represents an example of the unique marketing...

After reading the following article, how would you summarize it? What conclusions can be made about...

Pandora is the Internet’s most successful subscription radio service. As of June 2013, it had over...

Scott E. Miller, CPA, CVA has given an example of an expert witness in his article...

Read the following case carefully and then answer the questions. In the movie Face/Off, John Travolta...

Please read the case and answer the questions below: 1-3 The employer publishes the South Texas...

I did already posted this question before, I did get the answer but i am not...

The employer publishes the South Texas Clarion daily newspaper, employing 726 carriers on 780 routes through...

1.Establishing the virtual Management: As known, managing virtual staff requires a different method or approach than...

Need Online Homework Help?

Active Questions