This dataset has four features as follows: author, thread, length, and where to read the mail. According to the features the algorithm has to predict the user’s action whether to read or skip the mail.
Use Naïve Bayes classifier to predict the user’s action (skips or reads) when the author of the mail is known, the thread of the mail is follow up, the length of the mail is short, and where to read the email is home.
Author |
Thread |
Length |
Where to read |
User’s Action |
Known |
new |
long |
home |
Skips |
unknown |
new |
short |
work |
Reads |
unknown |
Follow up |
long |
work |
Skips |
Known |
Follow up |
Long |
Home |
Skips |
Known |
New |
Short |
Home |
Reads |
Known |
Follow up |
Long |
Work |
Skips |
Unknown |
New |
short |
work |
skips |
Unknown |
New |
short |
Work |
reads |
Known |
Follow up |
Long |
Home |
Skips |
known |
New |
Long |
Work |
skips |
unknown |
Follow up |
short |
home |
Skips |
Known |
new |
Long |
work |
Skips |
Known |
Follow up |
Short |
Home |
Reads |
Known |
New |
Short |
Work |
Reads |
known |
New |
short |
Home |
Reads |
Known |
Follow up |
short |
Work |
Reads |
Known |
New |
Short |
home |
Reads |
unknown |
new |
short |
work |
Reads |
Hint in authors feature you can use 0, 1 instead of unknown and known. In thread feature you can use 0, 1 instead of follow up and new. In length feature you can use 0, 1 instead of short and long. In where to read feature you can use 0, 1 instead of home, work. In the target you can use 0 instead of skips and 1 instead of reads.
Two different programs, 1 doesnt use scikit and 2 does use scikit-learn
without scikit
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
def accuracy_score(pre, y):
return 1 - sum(1.0 * (pre - y)**2)/len(y)
class GaussianNB:
def fit(self, X, y, epsilon = 1e-10):
self.y_classes, y_counts = np.unique(y, return_counts=True)
self.x_classes = np.array([np.unique(x) for x in X.T])
self.phi_y = 1.0 * y_counts/y_counts.sum()
self.u = np.array([X[y==k].mean(axis=0) for k in self.y_classes])
self.var_x = np.array([X[y==k].var(axis=0) + epsilon for k in self.y_classes])
return self
def predict(self, X):
return np.apply_along_axis(lambda x: self.compute_probs(x), 1, X)
def compute_probs(self, x):
probs = np.array([self.compute_prob(x, y) for y in range(len(self.y_classes))])
return self.y_classes[np.argmax(probs)]
def compute_prob(self, x, y):
c = 1.0 /np.sqrt(2.0 * np.pi * (self.var_x[y]))
return np.prod(c * np.exp(-1.0 * np.square(x - self.u[y]) / (2.0 * self.var_x[y])))
def evaluate(self, X, y):
return (self.predict(X) == y).mean()
pop = pd.read_csv('./Downloads/convertcsv.csv',dtype='category')
#print(pop)
pop = pop.apply(lambda x: x.astype(str).str.lower())
pop.columns = pop.columns.str.replace(' ', '')
#print(pop)
pop['Author']=pd.factorize(pop.Author)[0]
pop['Thread']=pd.factorize(pop.Thread)[0]
pop['Length']=pd.factorize(pop.Length)[0]
pop['Wheretoread']=pd.factorize(pop.Wheretoread)[0]
pop['UsersAction']=pd.factorize(pop.UsersAction)[0]
#print(pop)
##naive bayes
X=pop[['Author','Thread','Length','Wheretoread']].to_numpy()
print(X)
Y=pop[['UsersAction']].to_numpy().flatten()
#print(Y)
clf=GaussianNB().fit(X,Y)
print(GaussianNB().fit(X, Y).evaluate(X, Y))
Xtest=[[0,1,1,0]];
print(clf.predict(Xtest))
with scikit
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
pop = pd.read_csv('./Downloads/convertcsv.csv',dtype='category')
#print(pop)
pop = pop.apply(lambda x: x.astype(str).str.lower())
pop.columns = pop.columns.str.replace(' ', '')
#print(pop)
pop['Author']=pd.factorize(pop.Author)[0]
pop['Thread']=pd.factorize(pop.Thread)[0]
pop['Length']=pd.factorize(pop.Length)[0]
pop['Wheretoread']=pd.factorize(pop.Wheretoread)[0]
pop['UsersAction']=pd.factorize(pop.UsersAction)[0]
#print(pop)
##naive bayes
X=pop[['Author','Thread','Length','Wheretoread']].to_numpy()
print(X)
Y=pop[['UsersAction']].to_numpy().flatten()
#print(Y)
from sklearn.naive_bayes import GaussianNB
clf=GaussianNB().fit(X,Y)
GaussianNB().fit(X, Y)
Xtest=[[0,1,1,0]];
print(clf.predict(Xtest))
Get Answers For Free
Most questions answered within 1 hours.