机器学习决策树算法中特征选项的算法实现——信息熵
程序员文章站
2024-02-15 15:07:28
...
机器学习决策树算法中特征选项的算法实现——信息熵
首先我们将信息熵的定义进行阐述:
熵经验熵
我们这里以网上数据贷款申请为例:数据来自(https://blog.csdn.net/c406495762/article/details/75663451)
在这里我们利用ID3算法分别计算年龄这个属性里的三个分支青年H(D1)、H(D2)、H(D3)的信息熵。
问题理解很简单,公式也很简单,其代码如下:
"""
年龄:0代表青年,1代表中年,2代表老年
类别(是否给贷款):no代表否,yes代表是。
"""
from math import log
def funbasic(databases):
numlable_0 = 0
numlable_1 = 0
numlable_2 = 0
lable_0_yes = 0
lable_0_no = 0
lable_1_yes = 0
lable_1_no = 0
lable_2_yes = 0
lable_2_no = 0
sevlable_0_yes = 0.0
sevlable_0_no = 0.0
sevlable_1_yes = 0.0
sevlable_1_no = 0.0
sevlable_2_yes = 0.0
sevlable_2_no = 0.0
for lable in databases: #分别对青年、中年、老年中是否贷款的概率进行计算
if lable[0] == 0:
numlable_0 += 1
if lable[1] == 'yes':
lable_0_yes += 1
else:
lable_0_no += 1
elif lable[0] == 1:
numlable_1 += 1
if lable[1] == 'yes':
lable_1_yes += 1
else:
lable_1_no += 1
else:
numlable_2 += 1
if lable[1] == 'yes':
lable_2_yes += 1
else:
lable_2_no += 1
sevlable_0_yes = lable_0_yes/numlable_0
sevlable_0_no = lable_0_no/numlable_0
sevlable_1_yes = lable_1_yes/numlable_1
sevlable_1_no = lable_1_no/numlable_1
sevlable_2_yes = lable_2_yes/numlable_2
sevlable_2_no = lable_2_no/numlable_2
database1 = [[sevlable_0_yes,sevlable_0_no],[sevlable_1_yes,sevlable_1_no],[sevlable_2_yes,sevlable_2_no]]
return database1
def fun(database): #信息熵的计算
i = 0
for sevlable in database:
information = 0.0
for sve in sevlable:
information -= sve * log(sve,2)
print (i,':',information)
i += 1
if __name__ == '__main__':
databases = [[0,'no'],
[0,'no'],
[0,'yes'],
[0,'yes'],
[0,'no'],
[1,'no'],
[1,'no'],
[1,'yes'],
[1,'yes'],
[1,'yes'],
[2,'yes'],
[2,'yes'],
[2,'yes'],
[2,'yes'],
[2,'no']]
database = funbasic(databases)
print(database)
fun(database)
运行结果
上一篇: 看javaeye的论坛帖子时,ie8有问题 CSS
下一篇: 学习spark ml源码——线性回归