
Python 3 Rewrite KNN sample code using KNeighborsClassifier . ● Repeat KNN Step 1 – 5,...

Python 3

Rewrite KNN sample code using KNeighborsClassifier .

● Repeat KNN Step 1 – 5, for at least five times and calculate average accuracy to be your result.

● If you use the latest version of scikit -learn, you need to program with Python >= 3.5.

● Use the same dataset: “ ”

● Split your data: 67% for training and 33% for testing

● Draw a line chart: Use a “for loop” to change k from 1 to 10 and check your model accuracy.




import csv
import random
import math
import operator

def loadDataset(filename, split, trainingSet=[] , testSet=[]):
with open(filename, 'rb') as csvfile:
lines = csv.reader(csvfile)
dataset = list(lines)
for x in range(len(dataset)-1):
for y in range(4):
dataset[x][y] = float(dataset[x][y])
if random.random() < split:

def euclideanDistance(instance1, instance2, length):
distance = 0
for x in range(length):
distance += pow((instance1[x] - instance2[x]), 2)
return math.sqrt(distance)

def getNeighbors(trainingSet, testInstance, k):
distances = []
length = len(testInstance)-1
for x in range(len(trainingSet)):
dist = euclideanDistance(testInstance, trainingSet[x], length)
distances.append((trainingSet[x], dist))
neighbors = []
for x in range(k):
return neighbors

def getResponse(neighbors):
classVotes = {}
for x in range(len(neighbors)):
response = neighbors[x][-1]
if response in classVotes:
classVotes[response] += 1
classVotes[response] = 1
sortedVotes = sorted(classVotes.iteritems(), key=operator.itemgetter(1), reverse=True)
return sortedVotes[0][0]

def getAccuracy(testSet, predictions):
correct = 0
for x in range(len(testSet)):
if testSet[x][-1] == predictions[x]:
correct += 1
return (correct/float(len(testSet))) * 100.0
def main():
# prepare data
split = 0.67
loadDataset('', split, trainingSet, testSet)
print 'Train set: ' + repr(len(trainingSet))
print 'Test set: ' + repr(len(testSet))
# generate predictions
k_values = range(10)
for k in k_values:
for x in range(len(testSet)):
neighbors = getNeighbors(trainingSet, testSet[x], k+1)
result = getResponse(neighbors)

accuracy = getAccuracy(testSet, predictions)
print('Accuracy: ' + repr(accuracy) + '%')


Homework Answers

Answer #1

KNN (K-Nearest Neighbor) is a simple supervised classification algorithm we can use to assign a class to new data point. It can be used for regression as well, KNN does not make any assumptions on the data distribution, hence it is non-parametric. It keeps all the training data to make future predictions by computing the similarity between an input sample and each training instance.

KNN can be summarized as below:

  1. Computes the distance between the new data point with every training example.
  2. For computing the distance measures such as Euclidean distance, Hamming distance or Manhattan distance will be used.
  3. Model picks K entries in the database which are closest to the new data point.
  4. Then it does the majority vote i.e the most common class/label among those K entries will be the class of the new data point.

With K=3, Class B will be assigned, with K=6 Class A will be assigned

Detailed documentation on KNN is available here.

Below example shows implementation of KNN on iris dataset using scikit-learn library. Iris dataset has 50 samples for each different species of Iris flower(total of 150). For each sample we have sepal length, width and petal length and width and a species name(class/label).

Iris flower: sepal length, sepal width, petal length and width

  • 150 observations
  • 4 features(sepal length, sepal width, petal length, petal width)
  • Response variable is the iris species
  • Classification problem since response is categorical.

Our task is to build a KNN model which classifies the new species based on the sepal and petal measurements. Iris dataset is available in scikit-learn and we can make use of it build our KNN.

Complete code can be found in the Git Repo.

Step1: Import the required data and check the features.

Import the load_iris function form scikit-learen datasets module and create a iris Bunch object(bunch is a scikitlearn’s special object type for storing datasets and its attributes).

Each observation represents one flower and 4 columns represents 4 measurements.We can see the features(measures) under ‘data’ attribute, where as labels under ‘features_names’. As we can see below, labels/responses are encoded as 0,1 and 2. Because the features and repose should be numeric (Numpy arrays) for scikit-learn models and they should have a specific shape.

Step2: Split the data and Train the Model.

Training and testing on the same data is not an optimal approach, so we do split the data into two pieces, training set and testing set. We use ‘train_test_split’ function to split the data. Optional parameter ‘test-size’ determines the split percentage. ‘random_state’ parameter makes the data split the same way every time you run. Since we are training and testing on different sets of data, the resulting testing accuracy will be a better estimate of how well the model is likely to perform on unseen data.

Scikit-learn is carefully organized into modules, so that we can import the relevant classes easily. Import the class ‘KNeighborsClassifer’ from ‘neighbors’ module and Instantiate the estimator (‘estimator’ is scikit-learn’s term for a model). We are calling model as estimator because their primary role is to estimate unknown quantities.

In our example we are creating an instance (‘knn’ ) of the class ‘KNeighborsClassifer’, in other words we have created an object called ‘knn’ which knows how to do KNN classification once we provide the data. The parameter ‘n_neighbors’ is the tuning parameter/hyper parameter (k) . All other parameters are set to default values.

‘fit’ method is used to train the model on training data (X_train,y_train) and ‘predict’ method to do the testing on testing data (X_test). Choosing the optimal value of K is critical, so we fit and test the model for different values for K (from 1 to 25) using a for loop and record the KNN’s testing accuracy in a variable (scores).

Plot the relationship between the values of K and the corresponding testing accuracy using the matplotlib library. As we can see there is a raise and fall in the accuracy and it is quite typical when examining the model complexity with the accuracy. In general as the value of K increase there appears to be a raise in the accuracy and again it falls.

In general the Training accuracy rises as the model complexity increases, for KNN the model complexity is determined by the value of K. Larger K value leads to smoother decision boundary (less complex model). Smaller K leads to more complex model (may lead to overfitting). Testing accuracy penalizes models that are too complex(over fitting) or not complex enough(underfit). We get the maximum testing accuracy when the model has right level of complexity, in our case we can see that for a K value of 3 to 19 our model accuracy is 96.6%.

For our final model we can choose a optimal value of K as 5 (which falls between 3 and 19) and retrain the model with all the available data. And that will be our final model which is ready to make predictions.

Know the answer?
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for?
Ask your own homework help question
Similar Questions
""" ''' Write a python code to push all zeors to the end of an array...
""" ''' Write a python code to push all zeors to the end of an array ''' import numpy as np def Move_a(i):    num = len(a)    for k in range (i, num-1): a[k] = a[k+1] a[num-1] = 0    return a a = np.array([0,1,4,7,0,9,12,0,0,15,0,21]) #length of array (len) num = len(a) print (num) for i in range(0,num): if (a[i] == 0): #Functioon call to Move_a() a = Move_a(i)       print ("the array looks like") print (a) My...
So, i have this code in python that i'm running. The input file is named input2.txt...
So, i have this code in python that i'm running. The input file is named input2.txt and looks like 1.8 4.5 1.1 2.1 9.8 7.6 11.32 3.2 0.5 6.5 The output2.txt is what i'm trying to achieve but when the code runs is comes up blank The output doc is created and the code doesn't error out. it should look like this Sample Program Output 70 - 510, [semester] [year] NAME: [put your name here] PROGRAMMING ASSIGN MENT #2 Enter...
convert this code to accept int value instead of float values using python. Make sure to...
convert this code to accept int value instead of float values using python. Make sure to follow the same code. do not change the steps and make sure to point to what code you replaced. make sure to have 2 files Method:----------------------- #define a python user difined method def get_float_val (prompt): is_num = False str_val = input (prompt) #prming read for our while #while is_num == False: (ignore this but it works) old school while not is_num: try: value =...
I am working on exercise 5.30 from Introduction to Computing using python (Author: Perkovic). I was...
I am working on exercise 5.30 from Introduction to Computing using python (Author: Perkovic). I was looking at the solution and was able to understand what to do. However, when I implement the temp function as indicated, I keep getting this error "ValueError: the first two maketrans arguments must have equal length". However, it seems my two arguments are equal length, so I'm not sure what I am doing wrong! print('Exercise 5.30') def many(file): infile = open(file) content =
Why will this not work to do a 3 rail encryption? How would I make it...
Why will this not work to do a 3 rail encryption? How would I make it work? Will it encrypt special characters? The program language Python. userString=input("What would you like to decrypt and encrypt?: ") def threeRailEncrypt(plainText): railOne="" railTwo="" railThree="" for i in range(0,len(plainText),3): railOne=railOne+plainText[i] for i in range(1,len(plainText),3): railTwo=railTwo+plainText[i] for i in range(2,len(plainText),3): railThree=railThree+plainText[i] return railThree+railTwo+railOne print() print("Encrypted version:") print(threeRailEncrypt(userString)) def threeRailDecrypt(cipherText): thirdLen=len(cipherText)//3 if len(cipherText)%3==2: x=1 else: x=0 railThree=cipherText[:thirdLen] railTwo=cipherText[thirdLen:2*thirdLen+x] railOne=cipherText[2*thirdLen+x:] plainText=""    for i in range(thirdLen): plainText+=railOne[i]+railTwo[i]+railThree[i] if...
This is my code, python. I have to search through the roster list to find a...
This is my code, python. I have to search through the roster list to find a player using their number. it says list index out of range. it also says there is error in my main. def file_to_dictionary(rosterFile): myDictionary={}       with open(rosterFile,'r') as f:'\n')       for line in data:    (num,first,last,position)=line.split() myDict=[first, last, position] myDictionary[num]=myDict print (myDictionary) return myDictionary file_to_dictionary((f"../data/playerRoster.txt"))    def find_by_number(number): player=None    second=[] foundplayer= False myDictionary=file_to_dictionary((f"../data/playerRoster.txt")) for p in myDictionary: fullplayer=p.split() second.append([fullplayer[0], (fullplayer[1]+" "+...
I am a student taking python programming. Can this problem be modified using the define main...
I am a student taking python programming. Can this problem be modified using the define main method, def main()? import random #function definition #check for even and return 0 if even def isEven(number): if(number%2==0): return 0 #return 1 if odd else: return 1 #count variables even =0 odd = 0 c = 0 #loop iterates for 100 times for i in range(100): #generate random number n = random.randint(0,1000) #function call val = isEven(n) #check value in val and increment if(val==0):...
Develop a Traceroute application in python using ICMP. Your application will use ICMP but, in order...
Develop a Traceroute application in python using ICMP. Your application will use ICMP but, in order to keep it simple, will not exactly follow the official specification in RFC 1739.. Below you will find the skeleton code for the client. You are to complete the skeleton code. The places where you need to fill in code are marked with #Fill in start and #Fill in end. Code from socket import * import os import sys import struct import time import...
# Parts to be completed are marked with '<<<<< COMPLETE' import random N = 8 MAXSTEPS...
# Parts to be completed are marked with '<<<<< COMPLETE' import random N = 8 MAXSTEPS = 5000 # generates a random n-queens board # representation: a list of length n the value at index i is # row that contains the ith queen; # exampe for 4-queens: [0,2,0,3] means that the queen in column 0 is # sitting in row 0, the queen in colum 1 is in row, the queen in column 2 # is in row 0,...
With the code given write python code that prints the probability of getting a flush when...
With the code given write python code that prints the probability of getting a flush when you run 10**5 trails. this is what i have so far but it says that isFlush is not defined. why? # Print out probability that a 5-card hand has all the same suit #seed(0) num_trials = 10**5 trials = [dealHand for k in range(num_trials)] # 5 card hand prob = sum([l for h in trials if isFlush(h)])/num_trials # sum the list of numbers that...