机器学习系列之聚类
程序员文章站
2022-07-14 19:33:19
...
三种常用的聚类算法:
(1)基本K均值:基于原型的,划分的聚类技术,试图从全部数据对象中发现用户指定个数的簇。
(2)凝聚层次聚类:开始每个点各成一簇,然后重复的合并两个最近的簇,直到指定的簇个数。
(3)DBSCAN:一种划分的,基于密度的聚类算法。
优缺点,不再赘述,参考https://www.cnblogs.com/giserliu/archive/2015/04/05/4394807.html
#!/usr/bin/env python
# -*- coding:utf-8 -*-
__author__ = 'Great'
"""
指定三个质心
计算每个点到三个质心的距离
指派每个点到簇
更新质心
记录质心
显示分类及质心轨迹
"""
import numpy as np
a = np.random.randint(1,100, 80, dtype = 'int64')
b = np.random.randint(1, 150, 80)
points = []
for i in range(80):
points.append((a[i], b[i]))
import pylab as pl
#指定质心
current_point1 = [22, 111]
current_point2 = [100, 12]
current_point3 = [50, 56]
#显示质心
pl.plot([current_point1[0]], [current_point1[1]], 'ok')
pl.plot(current_point2[0], current_point2[1], 'ok')
pl.plot(current_point3[0], current_point3[1], 'ok')
#记录质心轨迹
current1 = [current_point1]
current2 = [current_point2]
current3 = [current_point3]
#三个聚类簇
group1 = []
group2 = []
group3 = []
for cost_time in range(100):
group1 = []; group2 = []; group3 = []
for onepoint in points:
distance1 = pow(abs(onepoint[0]-current_point1[0]), 2) + pow(abs(onepoint[1]-current_point1[1]), 2)
distance2 = pow(abs(onepoint[0] - current_point2[0]), 2) + pow(abs(onepoint[1] - current_point2[1]), 2)
distance3 = pow(abs(onepoint[0] - current_point3[0]), 2) + pow(abs(onepoint[1] - current_point3[1]), 2)
#指派到最近的簇
min_len = min(distance1, distance2, distance3)
if min_len == distance1:
group1.append(onepoint)
if min_len == distance2:
group2.append(onepoint)
if min_len == distance3:
group3.append(onepoint)
#更新质心
current_point1 = [sum([onepoint[0] for onepoint in group1])/len(group1), sum([onepoint[1] for onepoint in group1])/len(group1)]
current_point2 = [sum([onepoint[0] for onepoint in group2])/len(group2), sum([onepoint[1] for onepoint in group2])/len(group2)]
current_point3 = [sum([onepoint[0] for onepoint in group3])/len(group3), sum([onepoint[1] for onepoint in group3])/len(group3)]
current1.append(current_point1)
current2.append(current_point2)
current3.append(current_point3)
#打印簇
pl.plot([onepoint[0] for onepoint in group1], [onepoint[1] for onepoint in group1], "or")
pl.plot([onepoint[0] for onepoint in group2], [onepoint[1] for onepoint in group2], "oy")
pl.plot([onepoint[0] for onepoint in group3], [onepoint[1] for onepoint in group3], "og")
#打印质心轨迹
for center in [current1, current2, current3]:
pl.plot([eachcenter[0] for eachcenter in center], [eachenter[1] for eachenter in center])
#显示
pl.show()
#输出
print(current_point1, current_point2, current_point3)
print(group1)
print(group2)
print(group3)
本文,只是实现了基本的K均值聚类算法,并且没有对聚类结果进行优化。只是熟悉下,相关的计算过程。其中样本集,为随机生成的。过程主要参照https://www.oschina.net/code/snippet_176897_14731
希望自己越来越好,加油!走在成长的路上。
上一篇: 我的管理日志【一】