sklearn聚类KMeans
程序员文章站
2022-03-22 17:39:22
...
KMeans是最简单的聚类算法了,算法将一组N个样本的特征矩阵划分为K个无交集的簇,直观上来看是一组一组聚集在一起的数据,在一个簇中的数据就认为是同一类。
n_clusters
是KMeans中的k,k=模型划分为几类(必填参数,默认为8),但我们通常的结果会是一个小于8的结果。
代码实现(观察数据集的数据分布):
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
X, y = make_blobs(n_samples=500, n_features=2, centers=4, random_state=1)
fig, ax1 = plt.subplots(1)
color = ["red", "pink", "orange", "gray"]
for i in range(4):
ax1.scatter(X[y == i, 0], X[y == i, 1]
, marker='o'
, s=8
, c=color[i]
)
plt.show()
分成1-10簇分别的代码实现:
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
X, y = make_blobs(n_samples=500, n_features=2, centers=4, random_state=1)
# fig, ax1 = plt.subplots(1)
#
# color = ["red", "pink", "orange", "gray"]
# for i in range(4):
# ax1.scatter(X[y == i, 0], X[y == i, 1]
# , marker='o'
# , s=8
# , c=color[i]
# )
#
# plt.show()
cluster = KMeans(n_clusters=3, random_state=0).fit(X)
y_pred = cluster.labels_
print(y_pred)
print("---------------------------------------------------------------------")
pre = cluster.fit_predict(X)
print(pre == y_pred)
print("---------------------------------------------------------------------")
centroid=cluster.cluster_centers_
print(centroid)
print(centroid.shape)
print("---------------------------------------------------------------------")
inertia=cluster.inertia_
print(inertia) # 最小化每个样本点到质心的距离之和
print("---------------------------------------------------------------------")
# color=["red","pink","orange","gray"]
# fig,ax1=plt.subplots(1)
#
# for i in range(3):
# ax1.scatter(X[y_pred,0],X[y_pred,1]
# ,marker='o'
# ,s=8
# ,c=color[i]
# )
#
# ax1.scatter(centroid[:,0],centroid[:,1]
# ,marker='x'
# ,s=15
# ,c="black"
# )
#
# plt.show()
cluster=KMeans(4,random_state=0).fit(X)
inertia_=cluster.inertia_
print(inertia_) # 最小化每个样本点到质心的距离之和
for i in range(1,10):
cluster = KMeans(i, random_state=0).fit(X)
inertia_ = cluster.inertia_
print(inertia_) # 最小化每个样本点到质心的距离之和
上一篇: sklearn逻辑回归中损失函数与正则化
下一篇: Sklearn学习笔记一 :k-近邻算法