The mpg dataset has information on fuel mileage, engine information, and vehicle class of different cars stored in 234 rows and 11 columns. You want to use this information to predict the vehicle class of some new cars using the k-nearest neighbors algorithm. What would be a reasonable choice for k if you did not have time to experiment with all values of k?
Number of cars = 234 * 11 = 2574
The K-value depends upon the number of elements in our training set.
Usually the 70% / 30% rule is followed implies 70% of the elements are used as training data and 30$ is used as test or validation data
The number of cars in our training set will be 70% of 2574 = 0.7 * 2574 = 1801.8 = 1802 approximately
The K-value is the square root of number of cars in training sets
So k = 1802
= 42.44
So the K-value will be 43
Get Answers For Free
Most questions answered within 1 hours.