k-Nearest Neighbors in Python From Scratch

hashaki (25)in #blog • 6 years ago (edited)

What is k-Nearest Neighbors

The model for kNN is the entire training dataset. When a prediction is required for a unseen data instance, the kNN algorithm will search through the training dataset for the k-most similar instances. The prediction attribute of the most similar instances is summarized and returned as the prediction for the unseen instance.

The similarity measure is dependent on the type of data. For real-valued data, the Euclidean distance can be used. Other other types of data such as categorical or binary data, Hamming distance can be used.

In the case of regression problems, the average of the predicted attribute may be returned. In the case of classification, the most prevalent class may be returned.

I am too hungry to introduce more.I have to go out to find some food: )
All code here:

import math
import operator as op

a=[1,2,3,3,2,1,'a']
b=[7,8,9,9,8,7,'b']
c=[1,1,1,1,1,1]

平方差,计算出两个样本之间的距离

def euclieanDistance(instance1,instance2,lengh):
distance=0
for i in range(lengh):
distance+=pow(instance1[i]-instance2[i],2)
return math.sqrt(distance)

根据距离，找到最近的那堆数据

def getNeighbors(trainSet,testInstance,k):
distance=[]
lengh=len(testInstance) # 减去label项
for i in range(len(trainSet)):
dist=euclieanDistance(testInstance,trainSet[i],lengh)
distance.append((trainSet[i],dist))
distance.sort(key=op.itemgetter(1)) # 按照第二个选项排序，从小到大
neighbor=[]
for j in range(k):
neighbor.append(distance[j][0])
return neighbor

查看最近的数据是哪个标签，靠近的标签排序

def getResponse(neighbor):
classVote={}
for i in range(len(neighbor)):
response=neighbor[i][-1] # 把label选出来,所以response是标签
if response in classVote:
classVote[response]+=1
else:
classVote[response]=1
sortVote=sorted(classVote.items(),key=op.itemgetter(1),reverse=True) # 排序数量最多的那个标签,但为什么要反转？
return sortVote[0][0]

求准确度，相当于预测

def getAccuracy(testSet,predictions):
correct=0
for i in range(len(testSet)):
if testSet[i][-1] is predictions[i]:
correct+=1
return (correct/float(len(testSet)))*100.0

if name=='main':
predictions=[]
neighbor=getNeighbors([a,b],c,2)
print('neighbor',neighbor)
response=getResponse(neighbor)
print('response',response)
predictions.append(response)
print('predictions',predictions)
print(getAccuracy([[6,6,6,6,6,6,'b']],predictions))

output:
neighbor [[1, 2, 3, 3, 2, 1, 'a'], [7, 8, 9, 9, 8, 7, 'b']]
response a
predictions ['a']
0.0

#python #cn