python实现求特征选择的信息增益
程序员文章站
2023-10-28 08:51:22
使用python语言,实现求特征选择的信息增益,可以同时满足特征中有连续型和二值离散型属性的情况。
师兄让我做一个特征选择的代码,我在网上找了一下,大部分都是用来求离...
使用python语言,实现求特征选择的信息增益,可以同时满足特征中有连续型和二值离散型属性的情况。
师兄让我做一个特征选择的代码,我在网上找了一下,大部分都是用来求离散型属性的信息益益,但是我的数据是同时包含二值离散型和连续型属性的,所以这里实现了一下。
代码块
import numpy as np import math class ig(): def __init__(self,x,y): x = np.array(x) n_feature = np.shape(x)[1] n_y = len(y) orig_h = 0 for i in set(y): orig_h += -(y.count(i)/n_y)*math.log(y.count(i)/n_y) condi_h_list = [] for i in range(n_feature): feature = x[:,i] sourted_feature = sorted(feature) threshold = [(sourted_feature[inde-1]+sourted_feature[inde])/2 for inde in range(len(feature)) if inde != 0 ] thre_set = set(threshold) if float(max(feature)) in thre_set: thre_set.remove(float(max(feature))) if min(feature) in thre_set: thre_set.remove(min(feature)) pre_h = 0 for thre in thre_set: lower = [y[s] for s in range(len(feature)) if feature[s] < thre] highter = [y[s] for s in range(len(feature)) if feature[s] > thre] h_l = 0 for l in set(lower): h_l += -(lower.count(l) / len(lower))*math.log(lower.count(l) / len(lower)) h_h = 0 for h in set(highter): h_h += -(highter.count(h) / len(highter))*math.log(highter.count(h) / len(highter)) temp_condi_h = len(lower)/n_y *h_l+ len(highter)/n_y * h_h condi_h = orig_h - temp_condi_h pre_h = max(pre_h,condi_h) condi_h_list.append(pre_h) self.ig = condi_h_list def getig(self): return self.ig if __name__ == "__main__": x = [[1, 0, 0, 1], [0, 1, 1, 1], [0, 0, 1, 0]] y = [0, 0, 1] print(ig(x,y).getig())
输出结果为:
[0.17441604792151594, 0.17441604792151594, 0.17441604792151594, 0.6365141682948128]
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持。
下一篇: 咬自己的眼