SVM模型应用（五）通过随机逻辑回归模型对X特征值进行筛选以提高SVM模型预测

程序员文章站 2022-07-15 10:52:38

...

import numpy as np
import pandas as pd
from sklearn import svm
from sklearn.linear_model import LogisticRegression

my_matrix=np.loadtxt("D:/data/pima-indians-diabetes.txt",delimiter=",",skiprows=0) 

lenth_x=len(my_matrix[0])

data_y=my_matrix[:,lenth_x-1]

data_x=my_matrix[:,0:lenth_x-1]
print(data_x[0])

[   6.     148.      72.      35.       0.      33.6      0.627   50.   ]

from sklearn.linear_model import RandomizedLogisticRegression as RLR 
rlr = RLR() #建立随机逻辑回归模型，筛选变量
rlr.fit(data_x, data_y) #训练模型
rlr.get_support() #获取特征筛选结果，也可以通过.scores_方法获取各个特征的分数

array([ True,  True, False, False, False,  True,  True,  True], dtype=bool)

data_xp=pd.DataFrame(data_x)
data_x= data_xp[data_xp.columns[rlr.get_support()]].as_matrix() #筛选好特征
print(data_x[0])#特征值从8个减少到了5个

[   6.     148.      33.6      0.627   50.   ]

data_shape=data_x.shape
data_rows=data_shape[0]
data_cols=data_shape[1]

data_col_max=data_x.max(axis=0)#获取二维数组列向最大值
data_col_min=data_x.min(axis=0)#获取二维数组列向最小值
for i in xrange(0, data_rows, 1):#将输入数组归一化
    for j in xrange(0, data_cols, 1):
        data_x[i][j] = \
            (data_x[i][j] - data_col_min[j]) / \
            (data_col_max[j] - data_col_min[j])
print(data_x[0:2])

[[ 0.35294118  0.74371859  0.50074516  0.23441503  0.48333333]
 [ 0.05882353  0.42713568  0.39642325  0.11656704  0.16666667]]

n_train=int(len(data_y)*0.7)#选择70%的数据作为训练集，30%的数据作为测试集

X_train=data_x[:n_train]
y_train=data_y[:n_train]
X_test=data_x[n_train:]
y_test=data_y[n_train:]

clf1=svm.SVC(kernel='rbf', gamma=8, C=0.5)#模型1选择SVM模型，超参数已经在前面的博文中选择出C=0.5，gamma=8
clf1.fit(X_train,y_train)
clf2=LogisticRegression()#模型2选择逻辑回归
clf2.fit(X_train,y_train)


y_predictions1=clf1.predict(X_test)
y_predictions2=clf2.predict(X_test)

k,h=0,0
for i in range(len(y_test)):
    if y_predictions1[i]==y_test[i]:
        k+=1
for i in range(len(y_test)):
    if y_predictions2[i]==y_test[i]:
        h+=1 
print(k,h)

(190, 184)

accuracy_svm=float(k)/float(len(y_test))
accuracy_LogR=float(h)/float(len(y_test))
print"The accuracy of SVM is %f, and the accuracy of LogisticRegression is %f"%(accuracy_svm,accuracy_LogR)

The accuracy of SVM is 0.822511, and the accuracy of LogisticRegression is 0.796537

相关标签：特征值优化 SVM 机器学习

上一篇：特征值比对代码/计算相似度代码

下一篇： WebApi升级4.6.2以后报错:Method not found: 'System.Net.Http.HttpMethod System.Web.Http.Description.ApiDescr