欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

sklearn聚类KMeans

程序员文章站 2022-03-22 17:39:22
...

KMeans是最简单的聚类算法了,算法将一组N个样本的特征矩阵划分为K个无交集的簇,直观上来看是一组一组聚集在一起的数据,在一个簇中的数据就认为是同一类。

n_clusters

是KMeans中的k,k=模型划分为几类(必填参数,默认为8),但我们通常的结果会是一个小于8的结果。
代码实现(观察数据集的数据分布):

from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

X, y = make_blobs(n_samples=500, n_features=2, centers=4, random_state=1)
fig, ax1 = plt.subplots(1)

color = ["red", "pink", "orange", "gray"]
for i in range(4):
    ax1.scatter(X[y == i, 0], X[y == i, 1]
                , marker='o'
                , s=8
                , c=color[i]
                )

plt.show()

sklearn聚类KMeans
分成1-10簇分别的代码实现:

from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

X, y = make_blobs(n_samples=500, n_features=2, centers=4, random_state=1)
# fig, ax1 = plt.subplots(1)
# 
# color = ["red", "pink", "orange", "gray"]
# for i in range(4):
#     ax1.scatter(X[y == i, 0], X[y == i, 1]
#                 , marker='o'
#                 , s=8
#                 , c=color[i]
#                 )
# 
# plt.show()

cluster = KMeans(n_clusters=3, random_state=0).fit(X)
y_pred = cluster.labels_
print(y_pred)
print("---------------------------------------------------------------------")

pre = cluster.fit_predict(X)
print(pre == y_pred)
print("---------------------------------------------------------------------")

centroid=cluster.cluster_centers_
print(centroid)
print(centroid.shape)
print("---------------------------------------------------------------------")

inertia=cluster.inertia_
print(inertia) # 最小化每个样本点到质心的距离之和
print("---------------------------------------------------------------------")

# color=["red","pink","orange","gray"]
# fig,ax1=plt.subplots(1)
#
# for i in range(3):
#     ax1.scatter(X[y_pred,0],X[y_pred,1]
#                 ,marker='o'
#                 ,s=8
#                 ,c=color[i]
#                 )
#
# ax1.scatter(centroid[:,0],centroid[:,1]
#             ,marker='x'
#             ,s=15
#             ,c="black"
#            )
#
# plt.show()

cluster=KMeans(4,random_state=0).fit(X)
inertia_=cluster.inertia_
print(inertia_) # 最小化每个样本点到质心的距离之和

for i in range(1,10):
    cluster = KMeans(i, random_state=0).fit(X)
    inertia_ = cluster.inertia_
    print(inertia_)  # 最小化每个样本点到质心的距离之和
相关标签: Sklearn