100 Days Of ML Code:Day 7/11-KNN
程序员文章站
2022-07-14 20:32:37
...
100天机器学习挑战汇总文章链接在这儿。
目录
Step1:数据预处理
因为用的是同一个数据集,这一步与Day6逻辑回归做的完全一致。
import pandas as pd
import numpy as np
df = pd.read_csv('Social_Network_Ads.csv')
# print(df)
X = df.iloc[:, 2:4].values
Y = df.iloc[:, 4].values
# print(X)
# print(Y)
from sklearn.cross_validation import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.25, random_state=0)
# print(X_train)
# feature scaling
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.fit_transform(X_test)
# print(X_train)
# print(X_test)
Step2:将KNN应用于训练集
KNeighborsClassifier的指导页面在这儿。
from sklearn.neighbors import KNeighborsClassifier
k =5 # k is the number of nearest neighbor
neigh = KNeighborsClassifier(n_neighbors=k)
neigh.fit(X_train, Y_train)
标答中对KNN分类器的设定是:
classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
这两个parameters的含义是:
p : integer, optional (default = 2)
Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.
metric : string or callable, default ‘minkowski’
the distance metric to use for the tree. The default metric is minkowski, and with p=2 is equivalent to the standard Euclidean metric. See the documentation of the DistanceMetric class for a list of available metrics.
Step3:预测
Y_pred = neigh.predict(X_test)
# print(Y_pred)
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(Y_test, Y_pred)
# print(cm)
上一篇: 提防Java中的函数式编程!