机器学习Chapter-2（支持向量机）

程序员文章站 2022-05-22 09:05:33

...

理论部分

引言

上节我们在讲线性分类的时候，讲到了线性判别分析，线性判别分析的思想是：

训练时:设法将训练样本投影到一条直线上，使得同类样本的投影点尽可能地接近、异类的样本投影点尽可能的远
预测时:将待预测样本投影到学到的直线上，根据它的投影点的位置判断类别

两类样本的线性判别分析示意图如下:

机器学习Chapter-2（支持向量机）

与线性判别分析这种方法相比，我们很直观的想到另一种方法，即找一条直线直接分开两类不同样本，即如下图:

机器学习Chapter-2（支持向量机）

可以看到，在上述的图像中，存在多条能够分割两类样本的线，那么我们该选择那一条，又该如何求解？
本节讲的支持向量机(Support Vector Machine,SVM)会告诉你答案。

如何找到一条最佳的分割线？

这里我们是在线性分类-逻辑回归的基础上继续向下学习。

为了简化问题，我们先假设样本是线性可分的，且划分样本任务是两分类任务。(后面会讲到不可分或多分类任务的情况该怎么办)

机器学习Chapter-2（支持向量机）

上图示例存在多条可以划分两类样本的超平面(二维下超平面表现为直线)，很直观的认为最好的一个超平面应该是中间那个红色的，因为这个超平面位于两类样本的中间，它能够很好的划分两类样本，并且如果两类样本有扰动存在，它对两类样本的扰动忍耐能力最强(它在最中间，一般的小扰动下它依旧能够划分两类样本)，专业点来说，它的鲁棒能力好，泛化能力强。

间隔与支持向量

这里我们依旧是建立的在广义线性分类的基础上来讨论，我们认为划分超平面可以通过如下线性方程来描述:

w T x + b = 0

w=(w1,w2,...,wd)为法向量，决定了超平面的方向; d为位移量，决定了产品面与原点之间的距离。超平面可由(w,b)来确定，故我们将划分超平面记为(w,b)。

我们把Logistic回归的映射关系做一个改变，结果标签作如下改变:

y = 0, y = 1 \to y = - 1, y = 1

这里映射为-1,1是为了简化计算，这代表的时分类的两类结果标签，换言之，在进行分类的时候，遇到一个新的数据点x，将x代入g(z)中，如果g(z)小于0 则将x的类别赋为-1，如果g(z)大于0 则将x的类别赋为1。
至于为什么分成-1,1，这是为了简化表达方式，让计算更为方便。
线性分类两边的划分结果表示换了一种表示方式，这并不影响我们要求解的超平面。
下面我们令

{g (z) = 1, z \geq 0 g (z) = - 1, z \leq 0; z = w T x + b, g 为 映 射 关 系

依据上面的线性映射关系，有如下示例图

机器学习Chapter-2（支持向量机）

可以看到，在’+’样本上，最靠近划分超平面的样本点带入wTx+b划分结果刚好为1,同理’-‘样本也如此。这样的点称之为支持向量(Support Vector)。
从数学知识上我们知道，样本空间上任意点x到划分超平面的距离可写为

r = | w T x + b | | | w | |

这里我们将支持向量带入距离公式，则可以知道两类样本支持向量到超平面的距离之和为

r = 2 | | w | |

求解出最大的这个距离值，就能求出我们要找的划分超平面，这里我们把这个距离值称为间隔，即我们的求解目标是最大化间隔下的(w,b).

关于点到超平面距离的公式讲解

假定空间上一个点x，令其垂直投影到超平面上的对应点为x0，w是垂直于超平面的一个向量，r为样本x到分类间隔的距离，如图所示：

机器学习Chapter-2（支持向量机）

上面我们的将求解划分超平面问题转化为“最大化间隔”(maximum margin)，即找到最大化间隔下的参数(w,b),数学表达式为即

max w, b 2 | | w | |

s . t . y (w T x + b) \geq 1 (约 束 条 件)

最大化间隔，即最大化||w||−1,等价于最小化||w||2,我们将上式再转变一个形式:

min w, b 12 | | w | | 2

s . t . y (w T x + b) \geq 1 (约 束 条 件)

这就是支持向量机的基本型。约束条件是上面对于支持向量的存在关系的一个总写表达式。

对偶转化

分析一下求解最大间隔下的(w,b)问题:

min w, b 12 | | w | | 2 (目 标 函 数)

s . t . y (w T x + b) \geq 1 (约 束 条 件)

由目标函数是二次的，约束条件是线性的，这是一个凸二次规划问题。总结的来说:在一定的约束条件下，目标最优，损失最小。这可以通过QP优化包求解，但针对这个问题的特殊结构，可以通过拉格朗日乘子法得到其“对偶问题”(dual problem).

通过给每一个约束条件加上一个拉格朗日乘子αi≥0，定义拉格朗日函数为:

L (w, b, α) = 12 | | w | | 2 + \sum i = 1 m α i (1 - y i (w T x i + b))

对w和b求偏导并置为0(固定α),可得

w = \sum i = 1 m α i y i x i; 0 = \sum i = 1 m α i y i

将上述带入拉格朗日函数中可得:

max α \sum i = 1 m α i - 12 \sum i = 1 m \sum j = 1 m α i α j y i y j x T i x j

s . t . \sum i = 1 m α i y i = 0, a i \geq 0,, i = 1, 2, . . ., m

此时的拉格朗日函数只包含了一个变量αi，我们可以通过求解出αi，（αi求出了便能求出w和b）

上面的转换过程需要满足KKT条件，即要求

⎧ ⎩ ⎨ ⎪ ⎪ α i ⩾ 0 y i f (x i) - 1 ⩾ 0 α i (y i f (x i) - 1) = 0

KKT条件说明，对于任意训练样本总有αi=0或yif(xi)=1.若αi=0，则样本不会出现在求解式中，如果αi>0，则必有yif(xi)=1,这时所对应的样本位于最大间隔边界上，即是一个支持向量.这说明最终模型仅与支持向量有关。

对于变换后的拉格朗日函数如何求解，常用的方法是使用SMO算法，这里不深入讨论SMO算法，有兴趣的可以看看1998年的Platt.C John论文。

关于上面朗格朗日函数的转换过程如下:(减法或加法一样)

机器学习Chapter-2（支持向量机）

从线性可分情况转到线性不可分情况

核函数

关于超平面，对于一个数据点x进行分类，实际上是通过把x带入到f(x)=wTx+b 算出结果然后根据其正负号来进行类别划分。而前面的推导中我们得到：

w = \sum i = 1 m α i y i x i

故分类函数为:

f (x) = (\sum i = 1 m α i y i x i) T x + b = \sum i = 1 m α i y i ⟨ x i, x ⟩ + b

可以看到，在确定α的情况下，对于一个新的数据x的预测，只需要计算x的内积即可。这一点很重要，是之后使用Kernel进行非线性推广的基本前提。

上面得到的maximum margin hyper plane classifier，就是所谓的支持向量机（Support Vector Machine）。到目前为止，我们的SVM还比较弱，只能处理线性可分的情况，即存在一个划分超平面能将训练样本正确分类。但是在实际任务中，可能样本数据不能线性可分，即不存在这样的超平面，那该怎么办？

机器学习Chapter-2（支持向量机）

关于核函数的数学上的直观解释

例如我们要学习的模型的数据(输入空间)在同等维度上的特征空间(特征即我们从学习的w)是难以线性可分的.这时候，我们考虑到可以将数据映射到高维的特征空间,我们可能会在高维的特征空间内找到一个能够划分数据的超平面(数学上可证明:如果输入空间维度有限，那么一定存在高维特征空间使样本可分)。

由线性可分的表达式

f (x) = (\sum i = 1 m α i y i x i) T x + b = \sum i = 1 m α i y i ⟨ x i, x ⟩ + b

可以映射到高维空间下，映射关系为ϕ(⋅)得到

f (x) = \sum i = 1 m α i y i ⟨ ϕ (x i), ϕ (x) ⟩ + b

这里直接计算⟨ϕ(xi),ϕ(x)⟩是比较难的，我们假设存在这样的函数满足:

κ (x, x i) = ⟨ ϕ (x i), ϕ (x) ⟩

即x与xi在特征空间的内积等于它们在原始空间通过函数ϕ(⋅,⋅)计算结果，有了这样的ϕ(⋅,⋅)函数，我们省去了计算高维甚至无限维特征空间的内积。这里的ϕ(⋅,⋅)函数称为核函数。

这里问题很明显了，怎样能找到这个映射ϕ(⋅,⋅)函数。这里有数学上的一堆条件，就不细致讨论了。

常用的核函数有

名称	表达式	description
线性核	K(x1,x2)=⟨ϕ(x1),ϕ(x2)⟩	实际上就是原始空间的内积，线性核主要是保持形式上的统一
多项式核	K(x1,x2)=(⟨ϕ(x1),ϕ(x2)⟩)d	d为多项式的次数
高斯核	K(x1,x2)=exp(−\|\|x1−x2\|\|22σ2)	参数σ越大，高维特征上的权重衰减的越快，如果σ选的小，那么可以将任意数据映射为线性可分(这样会造成严重的过拟合)

也有一些核函数之间的组合核函数，这里就不展开了。

下图是一个总结草图:

机器学习Chapter-2（支持向量机）

输入空间对应欧式空间是有限维度，特征空间对应希尔伯特空间为无限维度。

软间隔与正则化

前面我们说到，如果样本是线性不可分的，只是因为数据有噪声，对于这种偏离正常位置很远的数据点，我们称之为outlier ，在我们原来的SVM模型里，outlier的存在有可能造成很大的影响，因为超平面本身就是只有少数几个support vector组成的，如果这些support vector里又存在outlier 的话，其影响就很大了。

针对这样的情况，我们可以在特征空间上找到一个超平面将样本划分，但是这时候我们需要找到一个合适的核函数，在找核函数时，往往很难找到合适的核函数，就算找到了，也很难判定核函数会不会造成严重的过拟合。针对这一问题，我们想到在线性分类中使用的方法：允许发生分类错误，加入正则惩罚项

机器学习Chapter-2（支持向量机）

前面介绍的支持向量机形式都是要求所有样本必须划分正确，这称为“硬间隔”，而“软间隔”则允许某些样本不满足约束yi(wTxi+b)≥1.

我们的优化目标应该有着尽可能少的不满足约束条件的点。那么我们在最大化间隔下的优化目标为:

min w, b 12 | | w | | 2 + C \sum i = 1 m ζ 0 / 1 (y i (w T x i + b) - 1)

这就是软间隔支持向量机。其中C>0是一个常数，ζ0/1是”0/1损失函数”。

当C值越大，对误分类的惩罚越大，误分点显得重要，C为无穷大，就与硬间隔SVM一致了
当C值越小，对误分类的惩罚减小，误分点不是那么重要，允许有分类错误发生。

对于损失函数ζ0/1,就和**函数一样，有许多可选形式:

名称	表达式
hinge损失	ζhinge(z)=max(0,1−z)
指数损失	ζexp(z)=exp(−z)
对率损失	ζlog(z)=log(1+exp(−z))

机器学习Chapter-2（支持向量机）

软间隔数学上的解释

在线性不可分的情况下，意味着某些样本点不满足yi(wTxi+b)≥1(函数间隔大于等于1)的约束条件。
即约束条件修改为:

y i (w T x i + b) \geq 1 - ξ i

即函数间隔加上ξi大于等于1，这里我们将ξi称为“松弛变量”(slack variables)ξi≥0,故上面的优化目标可以重写为:

min w, b, ξ i 12 | | w | | 2 + C \sum i = 1 m ξ i

此时对应的限制条件应该为

s . t . y i (w T x i + b) \geq 1 - ξ i

ξ i \geq 0, i = 1, 2, . . ., m

这时我们依旧使用拉格朗日对偶方法，得到最终的对偶问题:

机器学习Chapter-2（支持向量机）

由(6.41)可知

ai=0,则对最终模型不会产生影响
ai>0,则yif(xi)=1−ξi,满足此条件的样本为支持向量,又因为0≤ai≤C:
- 若0<ai<C,则 μi>0,由(6.41)可知ξi=0,则yif(xi)=1，样本落在最大间隔边界上
- 若ai=C, 由(6.39)可知μi=0,此时
  - 0<ξi<1,则yif(xi)>0，样本与标签同向，样本落在最大间隔内，分类正确
  - ξi=1,则yif(xi)=0，样本落在分离超平面上
  - ξi>1,则yif(xi)<0，样本落在分离超平面另一边，分类错误

机器学习Chapter-2（支持向量机）

支持向量回归

前面一直在说使用SVM做分类任务，SVM也可以用于回归问题。支持向量回归(Support Vector Regression,SVR)的基本思想:允许f(x)与y之间最多有ϵ的偏差。即当f(x)与y之间差值绝对值大于ϵ才计算损失。

机器学习Chapter-2（支持向量机）

由KKT条件:
样本不落入ϵ间隔带中，相应的αi和αi^才能取非零值。

当且仅当f(x)−yi−ϵ−ξ=0,αi非零
当且仅当yi−f(x)−ϵ−ξ=0,αi^非零
落在间隔带内的样本都满足αi=0和αi^=0
f(x)−yi−ϵ−ξ=0和yi−f(x)−ϵ−ξ=0,不同为零，如果同为0，得出ξi=0和ξi^=0，此时αi和αi^至少有一个为零

机器学习Chapter-2（支持向量机）

总结

学习流程图

机器学习Chapter-2（支持向量机）

SVM的优缺点

SVM本质上是非线性方法，在样本量比较少的时候，容易抓住数据和特征之间的非线性关系，因此可以解决非线性问题、避免神经网络结构选择和局部极小点问题、提高泛化性能、解决高维问题。

SVM对缺失数据敏感，对非线性问题没有通用的解决方案，必须谨慎选择核函数来处理，计算复杂度高。主流算法是O(n2), 这样对大规模数据就显得很无力。不仅如此，由于存在两个对结果影响相当大的超参数(高斯核，C核γ)，这两个超参数无法通过概率的方法进行计算，只能穷举验证求出，计算时间要远高于不少类似的非线性分类器。

Python实战下的SVM

本次用到的模块是sklearn下的svm包。

数据集

数据集依旧是糖尿病人数据集和鸢尾花数据集。

分类问题下的SVM

线性回归SVM

svm包下的LinearSVC类实现了线性分类支持向量机，这是根据liblinear实现的，可以用于二类分类，也可以用于多分类。

函数原型

    sklearn.svm.LinearSVC(self, penalty='l2', loss='squared_hinge', dual=True, tol=1e-4,C=1.0, multi_class='ovr', fit_intercept=True,intercept_scaling=1, class_weight=None, verbose=0,random_state=None, max_iter=1000)

参数	description
C	float, optional (default=1.0) 惩罚参数
loss	string, ‘hinge’ or ‘squared_hinge’ (default=’squared_hinge’) hinge损失函数或者hinge平方损失函数
penalty	string, ‘l1’ or ‘l2’ (default=’l2’) 正则化方法.l1会让权重更为稀疏。
dual	bool, (default=True) 如果为True，则解决对偶问题，False解决原始问题.当n_samples > n_features倾向于使用False
tol	float, optional (default=1e-4) 指定终止迭代的阈值.
multi_class	string, ‘ovr’ or ‘crammer_singer’ (default=’ovr’) 指定多分类问题的策略. ’ovr’: 采用one-vs-rest分类策略. ’crammer_singer’: 多类联合分类，很少用.
fit_intercept	boolean, optional (default=True) 是否计算截距，即参数b
intercept_scaling	float, optional (default=1) 添加一个人工特征
class_weight	{dict, ‘balanced’}, optional 指定各个类的权重
verbose	int, (default=0) 是否开启verbose输出
random_state	int seed, RandomState instance, or None (default=None) 指定随机数生成器
max_iter	int, (default=1000) 最大迭代次数


属性	description
coef_	array, shape = [n_features] if n_classes == 2 else [n_classes, n_features] 权重
intercept_	array, shape = [1] if n_classes == 2 else [n_classes] 偏置


方法	description
fit(X,y)	训练模型
predict(X)	用模型进行预测，返回预测值
score(X,y[,sample_weight])	返回在X,y上预测的准确率

程序小结

输出下面一段程序:

# -*- coding: utf-8 -*-
"""
    LinearSVC
"""
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model, cross_validation, svm

def load_data_classfication():
    '''
    加载用于分类问题的数据集

    :return: 一个元组，用于分类问题。
    元组元素依次为：训练样本集、测试样本集、训练样本集对应的标记、测试样本集对应的标记
    '''
    iris=datasets.load_iris() # 使用 scikit-learn 自带的 iris 数据集
    X_train=iris.data
    y_train=iris.target

    # 分层采样拆分成训练集和测试集，测试集大小为原始数据集大小的 1/4
    return cross_validation.train_test_split(X_train, y_train,test_size=0.25,
        random_state=0,stratify=y_train) 


def test_LinearSVC(*data):
    '''
    测试 LinearSVC 的用法

    :param data: 可变参数。它是一个元组，
    这里要求其元素依次为：训练样本集、测试样本集、训练样本的标记、测试样本的标记
    :return:  None
    '''
    X_train, X_test, y_train, y_test = data
    cls = svm.LinearSVC()
    cls.fit(X_train, y_train)
    print('Coefficients:%s, intercept %s'%(cls.coef_, cls.intercept_))
    print('Score: %.2f' % cls.score(X_test, y_test))

def test_LinearSVC_loss(*data):
    '''
    测试 LinearSVC 的预测性能随损失函数的影响

    :param data:  可变参数。它是一个元组
    这里要求其元素依次为：训练样本集、测试样本集、训练样本的标记、测试样本的标记
    :return:  None
    '''
    X_train,X_test,y_train,y_test=data
    losses=['hinge','squared_hinge']
    for loss in losses:
        cls = svm.LinearSVC(loss=loss)
        cls.fit(X_train,y_train)
        print("Loss:%s" %loss)
        print('Coefficients:%s, intercept %s'%(cls.coef_, cls.intercept_))
        print('Score: %.2f' % cls.score(X_test, y_test))

def test_LinearSVC_L12(*data):
    '''
    测试 LinearSVC 的预测性能随正则化形式的影响

    :param data:  可变参数。
    它是一个元组，这里要求其元素依次为：训练样本集、测试样本集、训练样本的标记、测试样本的标记
    :return:  None
    '''
    X_train,X_test,y_train,y_test=data
    L12=['l1','l2']
    for p in L12:
        cls=svm.LinearSVC(penalty=p,dual=False)
        cls.fit(X_train,y_train)
        print("penalty:%s"%p)
        print('Coefficients:%s, intercept %s'%(cls.coef_,cls.intercept_))
        print('Score: %.2f' % cls.score(X_test, y_test))

def test_LinearSVC_C(*data):
    '''
    测试 LinearSVC 的预测性能随参数 C 的影响

    :param data: 可变参数。
    它是一个元组，这里要求其元素依次为：训练样本集、测试样本集、训练样本的标记、测试样本的标记
    :return:   None
    '''
    X_train,X_test,y_train,y_test=data
    Cs=np.logspace(-2,1)
    train_scores=[]
    test_scores=[]
    for C in Cs:
        cls = svm.LinearSVC(C=C)
        cls.fit(X_train,y_train)
        train_scores.append(cls.score(X_train,y_train))
        test_scores.append(cls.score(X_test,y_test))

    ## 绘图
    fig=plt.figure()
    ax=fig.add_subplot(1,1,1)
    ax.plot(Cs, train_scores, label="Traing score")
    ax.plot(Cs, test_scores, label="Testing score")
    ax.set_xlabel(r"C")
    ax.set_ylabel(r"score")
    ax.set_xscale('log')
    ax.set_title("LinearSVC")
    ax.legend(loc='best')
    plt.show()

if __name__=="__main__":
    X_train,X_test,y_train,y_test=load_data_classfication() # 生成用于分类的数据集
    # test_LinearSVC(X_train,X_test,y_train,y_test) # 调用 test_LinearSVC
    #test_LinearSVC_loss(X_train,X_test,y_train,y_test) # 调用 test_LinearSVC_loss
    #test_LinearSVC_L12(X_train,X_test,y_train,y_test) # 调用 test_LinearSVC_L12
    test_LinearSVC_C(X_train,X_test,y_train,y_test) # 调用 test_LinearSVC_C

输出

调用 test_LinearSVC

    Coefficients:[[ 0.20958771  0.39923413 -0.81739032 -0.4423166 ]
     [-0.12727239 -0.78629251  0.51923075 -1.02165378]
     [-0.80302137 -0.87629997  1.21355254  1.80971049]], intercept [ 0.11973674  2.03860961 -1.44451052]
    Score: 0.97

调用 test_LinearSVC_loss

    Loss:hinge
    Coefficients:[[ 0.36644823  0.32153865 -1.07539266 -0.57006165]
     [ 0.46930202 -1.55476008  0.40439748 -1.35516144]
     [-1.21258611 -1.15286359  1.84890006  1.98419065]], intercept [ 0.1805379   1.34466693 -1.42744115]
    Score: 0.97
    Loss:squared_hinge
    Coefficients:[[ 0.20959966  0.39924609 -0.81739156 -0.44231601]
     [-0.12494968 -0.7851915   0.51757814 -1.02401011]
     [-0.80312638 -0.87597978  1.21372878  1.80998711]], intercept [ 0.11974022  2.03041493 -1.44417346]
    Score: 0.97

调用 test_LinearSVC_L12

    penalty:l1
    Coefficients:[[ 0.16622076  0.51868466 -0.9346618   0.        ]
     [-0.15340139 -0.90804423  0.48273882 -0.9338501 ]
     [-0.55291492 -0.85823785  0.94885692  2.34050752]], intercept [ 0.          2.58558288 -2.63447548]
    Score: 0.95
    penalty:l2
    Coefficients:[[ 0.20966721  0.39922563 -0.81739423 -0.44237657]
     [-0.13079574 -0.7872181   0.52298032 -1.02445961]
     [-0.80308922 -0.87656106  1.21391169  1.81021937]], intercept [ 0.11945388  2.04805235 -1.44409296]
    Score: 0.97

调用 test_LinearSVC_C
C衡量了误分类点的重要性，C越大则误分类点越重要。

机器学习Chapter-2（支持向量机）

非线性分类SVM

这里要用到的是sklearn.svm.SVC类。SVC实现了非线性分类支持向量机，是根据libsvm实现的。

函数原型

    sklearn.svm.SVC( C=1.0, kernel='rbf', degree=3, gamma='auto',
                 coef0=0.0, shrinking=True, probability=False,
                 tol=1e-3, cache_size=200, class_weight=None,
                 verbose=False, max_iter=-1, decision_function_shape=None,
                 random_state=None)

参数	description
C	float, optional (default=1.0) 惩罚参数
kernel	string, optional (default=’rbf’) It must be one of ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’ or a callable 指定使用的核函数.
degree	int, optional (default=3) 指定当前核函数为多项式，多项式系数.
gamma	float, optional (default=’auto’) 当核函数为’rbf’, ‘poly’ and ‘sigmoid’. 核函数的系数.如果为’auto’,表示系数的1/n_features
tol	float, optional (default=1e-4) 指定终止迭代的阈值.
coef0	float, optional (default=0.0) 指定核函数的*项，应用于’poly’ and ‘sigmoid’
probability	boolean, optional (default=False) 如果为True,则会进行概率估计。必须在训练之前设置好，且概率估计会拖慢训练速度
shrinking	boolean, optional (default=True) 如果为True，则会进行启发式收缩
class_weight	{dict, ‘balanced’}, optional 指定各个类的权重
verbose	int, (default=0) 是否开启verbose输出
random_state	int seed, RandomState instance, or None (default=None) 指定随机数生成器
max_iter	int, (default=1000) 最大迭代次数
cache_size	float, optional 指定kernel cache的大小，单位为MB.
decision_function_shape	‘ovo’, ‘ovr’ or None, default=None ’ovr’: 使用one-vs-rest准则，决策函数形状为(n_samples,n_classes).此时对每个分类定义一个二类SVM，一共n_classes个二类SVM组合一个多类SVM ’ovo’:使用one-vs-one准则，决策函数形状为(n_samples,n_classes(n_classes-1)/2).此时对每一对分类直接定一个一个二类SVM，一共n_classes(n_classes-1)/2个二类SVM组合成一个多类SVM.


属性	description
coef_	array, shape = [n_class-1, n_features] 每个特征的系数，只有在Linear_kernel中有效.
intercept_	array, shape = [n_class * (n_class-1) / 2] 决策函数中的常数项
support_	array-like, shape = [n_SV] 支持向量的下标
support_vectors_	array-like, shape = [n_SV, n_features] 支持向量
n_support_	array-like, dtype=int32, shape = [n_class] 每一个分类的支持向量的个数
dual_coef_	array, shape = [n_class-1, n_SV] 对偶问题中，在分类决策函数中每个支持向量的系数.


方法	description
fit(X,y)	训练模型
predict(X)	用模型进行预测，返回预测值
score(X,y[,sample_weight])	返回在X,y上预测的准确率
predict_log_proba(X)	返回一个数组，数组的元素依次是X预测为各个类别的概率的对数值
predict_proba(X)	返回一个数组，数组的元素依次是X预测为各个类别的概率值

程序小结

# -*- coding: utf-8 -*-
"""
    SVC
"""
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model,cross_validation,svm

def load_data_classfication():
    '''
    加载用于分类问题的数据集

    :return: 一个元组，用于分类问题。
    元组元素依次为：训练样本集、测试样本集、训练样本集对应的标记、测试样本集对应的标记
    '''
    iris=datasets.load_iris()# 使用 scikit-learn 自带的 iris 数据集
    X_train=iris.data
    y_train=iris.target
    # 分层采样拆分成训练集和测试集，测试集大小为原始数据集大小的 1/4
    return cross_validation.train_test_split(X_train, y_train,test_size=0.25,
        random_state=0,stratify=y_train)

def test_SVC_linear(*data):
    '''
    测试 SVC 的用法。这里使用的是最简单的线性核
    :param data:
    :return: None
    '''
    X_train,X_test,y_train,y_test=data
    cls=svm.SVC(kernel='linear')
    cls.fit(X_train,y_train)
    print('Coefficients:%s, intercept %s'%(cls.coef_,cls.intercept_))
    print('Score: %.2f' % cls.score(X_test, y_test))

def test_SVC_poly(*data):
    '''
    测试多项式核的 SVC 的预测性能随 degree、gamma、coef0 的影响.
    :param data:
    :return: None
    '''
    X_train,X_test,y_train,y_test=data
    fig=plt.figure()
    ### 测试 degree ####
    degrees=range(1,20)
    train_scores=[]
    test_scores=[]
    for degree in degrees:
        cls=svm.SVC(kernel='poly',degree=degree)
        cls.fit(X_train,y_train)
        train_scores.append(cls.score(X_train,y_train))
        test_scores.append(cls.score(X_test, y_test))
    ax=fig.add_subplot(1,3,1) # 一行三列
    ax.plot(degrees,train_scores,label="Training score ",marker='+' )
    ax.plot(degrees,test_scores,label= " Testing  score ",marker='o' )
    ax.set_title( "SVC_poly_degree ")
    ax.set_xlabel("p")
    ax.set_ylabel("score")
    ax.set_ylim(0,1.05) # y轴标准值固定到0,1.05
    ax.legend(loc="best",framealpha=0.5)

    ### 测试 gamma ，此时 degree 固定为 3####
    gammas=range(1,20)
    train_scores=[]
    test_scores=[]
    for gamma in gammas:
        cls=svm.SVC(kernel='poly',gamma=gamma,degree=3)
        cls.fit(X_train,y_train)
        train_scores.append(cls.score(X_train,y_train))
        test_scores.append(cls.score(X_test, y_test))
    ax=fig.add_subplot(1,3,2)
    ax.plot(gammas,train_scores,label="Training score ",marker='+' )
    ax.plot(gammas,test_scores,label= " Testing  score ",marker='o' )
    ax.set_title( "SVC_poly_gamma ")
    ax.set_xlabel(r"$\gamma$")
    ax.set_ylabel("score")
    ax.set_ylim(0,1.05)
    ax.legend(loc="best",framealpha=0.5)

    ### 测试 r ，此时 gamma固定为10 ， degree 固定为 3######
    rs=range(0,20)
    train_scores=[]
    test_scores=[]
    for r in rs:
        cls=svm.SVC(kernel='poly',gamma=10,degree=3,coef0=r)
        cls.fit(X_train,y_train)
        train_scores.append(cls.score(X_train,y_train))
        test_scores.append(cls.score(X_test, y_test))
    ax=fig.add_subplot(1,3,3)
    ax.plot(rs,train_scores,label="Training score ",marker='+' )
    ax.plot(rs,test_scores,label= " Testing  score ",marker='o' )
    ax.set_title( "SVC_poly_r ")
    ax.set_xlabel(r"r")
    ax.set_ylabel("score")
    ax.set_ylim(0,1.05)
    ax.legend(loc="best",framealpha=0.5)
    plt.show()

def test_SVC_rbf(*data):
    '''
    测试 高斯核的 SVC 的预测性能随 gamma 参数的影响
    :param data:
    :return: None
    '''
    X_train,X_test,y_train,y_test=data
    gammas=range(1,20)
    train_scores=[]
    test_scores=[]
    for gamma in gammas:
        cls=svm.SVC(kernel='rbf',gamma=gamma)
        cls.fit(X_train,y_train)
        train_scores.append(cls.score(X_train,y_train))
        test_scores.append(cls.score(X_test, y_test))
    fig=plt.figure()
    ax=fig.add_subplot(1,1,1)
    ax.plot(gammas,train_scores,label="Training score ",marker='+' )
    ax.plot(gammas,test_scores,label= " Testing  score ",marker='o' )
    ax.set_title( "SVC_rbf")
    ax.set_xlabel(r"$\gamma$")
    ax.set_ylabel("score")
    ax.set_ylim(0,1.05)
    ax.legend(loc="best",framealpha=0.5)
    plt.show()

def test_SVC_sigmoid(*data):
    '''
    测试 sigmoid 核的 SVC 的预测性能随 gamma、coef0 的影响.
    :param data:
    :return: None
    '''
    X_train,X_test,y_train,y_test=data
    fig=plt.figure()

    ### 测试 gamma ，固定 coef0 为 0 ####
    gammas=np.logspace(-2,1)
    train_scores=[]
    test_scores=[]

    for gamma in gammas:
        cls=svm.SVC(kernel='sigmoid',gamma=gamma,coef0=0)
        cls.fit(X_train,y_train)
        train_scores.append(cls.score(X_train,y_train))
        test_scores.append(cls.score(X_test, y_test))
    ax=fig.add_subplot(1,2,1)
    ax.plot(gammas,train_scores,label="Training score ",marker='+' )
    ax.plot(gammas,test_scores,label= " Testing  score ",marker='o' )
    ax.set_title( "SVC_sigmoid_gamma ")
    ax.set_xscale("log")
    ax.set_xlabel(r"$\gamma$")
    ax.set_ylabel("score")
    ax.set_ylim(0,1.05)
    ax.legend(loc="best",framealpha=0.5)
    ### 测试 r，固定 gamma 为 0.01 ######
    rs=np.linspace(0,5)
    train_scores=[]
    test_scores=[]

    for r in rs:
        cls=svm.SVC(kernel='sigmoid',coef0=r,gamma=0.01)
        cls.fit(X_train,y_train)
        train_scores.append(cls.score(X_train,y_train))
        test_scores.append(cls.score(X_test, y_test))
    ax=fig.add_subplot(1,2,2)
    ax.plot(rs,train_scores,label="Training score ",marker='+' )
    ax.plot(rs,test_scores,label= " Testing  score ",marker='o' )
    ax.set_title( "SVC_sigmoid_r ")
    ax.set_xlabel(r"r")
    ax.set_ylabel("score")
    ax.set_ylim(0,1.05)
    ax.legend(loc="best",framealpha=0.5)
    plt.show()


if __name__=="__main__":
    X_train,X_test,y_train,y_test=load_data_classfication() # 生成用于分类问题的数据集
    # test_SVC_linear(X_train,X_test,y_train,y_test) # 调用 test_SVC_linear
    # test_SVC_poly(X_train,X_test,y_train,y_test) # 调用 test_SVC_poly
    # test_SVC_rbf(X_train,X_test,y_train,y_test) # 调用 test_SVC_rbf
    test_SVC_sigmoid(X_train,X_test,y_train,y_test) # test_SVC_sigmoid

输出

调用 test_SVC_linear

    Coefficients:[[-0.16990304  0.47442881 -0.93075307 -0.51249447]
     [ 0.02439178  0.21890135 -0.52833486 -0.25913786]
     [ 0.52289771  0.95783924 -1.82516872 -2.00292778]], intercept [ 2.0368826  1.1512924  6.3276538]
    Score: 1.00

测试多项式核:κ(x,xi)=(γ(xxi+1)+τ)p.

参数p由degree决定
参数γ由gamma决定
参数τ由coef0决定

测试结果如下:
机器学习Chapter-2（支持向量机）

测试高斯核:κ(x,xi)=exp(−γ||x−xi||2)

参数γ由gamma决定

测试结果如下:

机器学习Chapter-2（支持向量机）

测试sigmoid核:κ(x,xi)=tanh(γ(xxi+1)+τ)

参数γ由gamma决定
参数τ由coef0决定

测试结果如下:

机器学习Chapter-2（支持向量机）

效果很差。

回归问题下的SVR

线性回归SVR

这里要用到的是sklearn.svm.LinearSVR类。LinearSVR实现了线性回归支持向量机，是基于liblinear实现的。

函数原型

    sklearn.svm.LinearSVR(epsilon=0.0, tol=1e-4, C=1.0,loss='epsilon_insensitive', fit_intercept=True,intercept_scaling=1., dual=True, verbose=0,random_state=None, max_iter=1000)

参数	description
C	float, optional (default=1.0) 惩罚参数
loss	string, ‘epsilon_insensitive’ or ‘squared_epsilon_insensitive’ (default=’epsilon_insensitive’) epsilon_insensitiveb标准SVR损失函数Lϵ或者squared_epsilon_insensitive标准SVR损失函数平方L2ϵ
epsilon	float, optional (default=0.1) 用于loss中的ϵ参数
dual	bool, (default=True) 如果为True，则解决对偶问题，False解决原始问题.当n_samples > n_features倾向于使用False
tol	float, optional (default=1e-4) 指定终止迭代的阈值.
fit_intercept	boolean, optional (default=True) 是否计算截距，即参数b
intercept_scaling	float, optional (default=1) 添加一个人工特征
class_weight	{dict, ‘balanced’}, optional 指定各个类的权重
verbose	int, (default=0) 是否开启verbose输出
random_state	int seed, RandomState instance, or None (default=None) 指定随机数生成器
max_iter	int, (default=1000) 最大迭代次数


属性	description
coef_	array, shape = [n_features] if n_classes == 2 else [n_classes, n_features] 权重
intercept_	array, shape = [1] if n_classes == 2 else [n_classes] 偏置


方法	description
fit(X,y)	训练模型
predict(X)	用模型进行预测，返回预测值
score(X,y[,sample_weight])	返回在X,y上预测的准确率

程序小结

# -*- coding: utf-8 -*-
"""
    LinearSVR
"""
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model,cross_validation,svm
def load_data_regression():
    '''
    加载用于回归问题的数据集
    :return: 一个元组，用于回归问题。
    元组元素依次为：训练样本集、测试样本集、训练样本集对应的值、测试样本集对应的值
    '''
    diabetes = datasets.load_diabetes() #使用 scikit-learn 自带的一个糖尿病病人的数据集
    return cross_validation.train_test_split(diabetes.data,diabetes.target,
        test_size=0.25,random_state=0)# 拆分成训练集和测试集，测试集大小为原始数据集大小的 1/4

def test_LinearSVR(*data):
    '''
    测试 LinearSVR 的用法
    :param data:
    :return: None
    '''
    X_train,X_test,y_train,y_test=data
    regr=svm.LinearSVR()
    regr.fit(X_train,y_train)
    print('Coefficients:%s, intercept %s'%(regr.coef_,regr.intercept_))
    print('Score: %.2f' % regr.score(X_test, y_test))

def test_LinearSVR_loss(*data):
    '''
       测试 LinearSVR 的预测性能随不同损失函数的影响
    :param data:
    :return:
    '''
    X_train,X_test,y_train,y_test=data
    losses=['epsilon_insensitive','squared_epsilon_insensitive']
    for loss in losses:
        regr=svm.LinearSVR(loss=loss)
        regr.fit(X_train,y_train)
        print("loss：%s"%loss)
        print('Coefficients:%s, intercept %s'%(regr.coef_,regr.intercept_))
        print('Score: %.2f' % regr.score(X_test, y_test))

def test_LinearSVR_epsilon(*data):
    '''
    测试 LinearSVR 的预测性能随 epsilon 参数的影响
    :param data:
    :return: None
    '''
    X_train,X_test,y_train,y_test=data
    epsilons=np.logspace(-2,2)
    train_scores=[]
    test_scores=[]
    for  epsilon in  epsilons:
        regr=svm.LinearSVR(epsilon=epsilon,loss='squared_epsilon_insensitive')
        regr.fit(X_train,y_train)
        train_scores.append(regr.score(X_train, y_train))
        test_scores.append(regr.score(X_test, y_test))
    fig=plt.figure()
    ax=fig.add_subplot(1,1,1)
    ax.plot(epsilons,train_scores,label="Training score ",marker='+' )
    ax.plot(epsilons,test_scores,label= " Testing  score ",marker='o' )
    ax.set_title( "LinearSVR_epsilon ")
    ax.set_xscale("log")
    ax.set_xlabel(r"$\epsilon$")
    ax.set_ylabel("score")
    ax.set_ylim(-1,1.05)
    ax.legend(loc="best",framealpha=0.5)
    plt.show()

def test_LinearSVR_C(*data):
    '''
    测试 LinearSVR 的预测性能随 C 参数的影响
    :param data:
    :return: None
    '''
    X_train,X_test,y_train,y_test=data
    Cs=np.logspace(-1,2)
    train_scores=[]
    test_scores=[]
    for  C in  Cs:
        regr=svm.LinearSVR(epsilon=0.1,loss='squared_epsilon_insensitive',C=C)
        regr.fit(X_train,y_train)
        train_scores.append(regr.score(X_train, y_train))
        test_scores.append(regr.score(X_test, y_test))
    fig=plt.figure()
    ax=fig.add_subplot(1,1,1)
    ax.plot(Cs,train_scores,label="Training score ",marker='+' )
    ax.plot(Cs,test_scores,label= " Testing  score ",marker='o' )
    ax.set_title( "LinearSVR_C ")
    ax.set_xscale("log")
    ax.set_xlabel(r"C")
    ax.set_ylabel("score")
    ax.set_ylim(-1,1.05)
    ax.legend(loc="best",framealpha=0.5)
    plt.show()

if __name__=="__main__":
    X_train,X_test,y_train,y_test=load_data_regression() # 生成用于回归问题的数据集
    # test_LinearSVR(X_train,X_test,y_train,y_test) # 调用 test_LinearSVR
    # test_LinearSVR_loss(X_train,X_test,y_train,y_test) # 调用 test_LinearSVR_loss
    # test_LinearSVR_epsilon(X_train,X_test,y_train,y_test) # 调用 test_LinearSVR_epsilon
    test_LinearSVR_C(X_train,X_test,y_train,y_test) # 调用 test_LinearSVR_C

输出

测试LinearSVR类线性回归支持向量机的能力:

    Coefficients:[ 2.14940259  0.4418875   6.35258779  4.62357282  2.82085901  2.42005063
     -5.3367464   5.41765142  7.26812843  4.33778867], intercept [ 99.]
    Score: -0.56

效果不佳.

测试损失函数对预测性能的影响:

squared_epsilon_insensitive时性能好点.

    loss：epsilon_insensitive
    Coefficients:[ 2.14940259  0.4418875   6.35258779  4.62357282  2.82085901  2.42005063
     -5.3367464   5.41765142  7.26812843  4.33778867], intercept [ 99.]
    Score: -0.56
    loss：squared_epsilon_insensitive
    Coefficients:[   7.05593116 -103.3282818   395.67307025  221.76243025  -11.08017954
      -63.55554232 -176.67840376  117.55891822  322.63894067   95.61734901], intercept [ 152.37383103]
    Score: 0.38

测试ϵ对性能的影响:

机器学习Chapter-2（支持向量机）

也不咋的.

测试惩罚系数C对性能的影响:

机器学习Chapter-2（支持向量机）

非线性回归SVR

sklearn.svm.SVR实现了非线性回归支持向量机，基于libsvm实现的.

函数原型

    sklearn.svm.SVR(kernel='rbf', degree=3, gamma='auto', coef0=0.0,tol=1e-3, C=1.0, epsilon=0.1, shrinking=True,cache_size=200, verbose=False, max_iter=-1)

参数	description
C	float, optional (default=1.0) 惩罚参数
kernel	string, optional (default=’rbf’) It must be one of ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’ or a callable 指定使用的核函数.
degree	int, optional (default=3) 指定当前核函数为多项式，多项式系数.
gamma	float, optional (default=’auto’) 当核函数为’rbf’, ‘poly’ and ‘sigmoid’. 核函数的系数.如果为’auto’,表示系数的1/n_features
tol	float, optional (default=1e-4) 指定终止迭代的阈值.
coef0	float, optional (default=0.0) 指定核函数的*项，应用于’poly’ and ‘sigmoid’
shrinking	boolean, optional (default=True) 如果为True，则会进行启发式收缩
class_weight	{dict, ‘balanced’}, optional 指定各个类的权重
verbose	int, (default=0) 是否开启verbose输出
random_state	int seed, RandomState instance, or None (default=None) 指定随机数生成器
max_iter	int, (default=1000) 最大迭代次数
cache_size	float, optional 指定kernel cache的大小，单位为MB.


属性	description
coef_	array, shape = [n_class-1, n_features] 每个特征的系数，只有在Linear_kernel中有效.
intercept_	array, shape = [n_class * (n_class-1) / 2] 决策函数中的常数项
support_	array-like, shape = [n_SV] 支持向量的下标
support_vectors_	array-like, shape = [n_SV, n_features] 支持向量
n_support_	array-like, dtype=int32, shape = [n_class] 每一个分类的支持向量的个数
dual_coef_	array, shape = [n_class-1, n_SV] 对偶问题中，在分类决策函数中每个支持向量的系数.


方法	description
fit(X,y)	训练模型
predict(X)	用模型进行预测，返回预测值
score(X,y[,sample_weight])	返回在X,y上预测的准确率

程序小结

# -*- coding: utf-8 -*-
"""
    SVR
"""
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model,cross_validation,svm
def load_data_regression():
    '''
    加载用于回归问题的数据集
    :return: 一个元组，用于回归问题。
    元组元素依次为：训练样本集、测试样本集、训练样本集对应的值、测试样本集对应的值
    '''
    diabetes = datasets.load_diabetes()
    return cross_validation.train_test_split(diabetes.data,diabetes.target,
        test_size=0.25,random_state=0)

def test_SVR_linear(*data):
    '''
    测试 SVR 的用法。这里使用最简单的线性核
    :param data:
    :return: None
    '''
    X_train,X_test,y_train,y_test=data
    regr=svm.SVR(kernel='linear')
    regr.fit(X_train,y_train)
    print('Coefficients:%s, intercept %s'%(regr.coef_,regr.intercept_))
    print('Score: %.2f' % regr.score(X_test, y_test))

def test_SVR_poly(*data):
    '''
    测试 多项式核的 SVR 的预测性能随  degree、gamma、coef0 的影响.

    :param data:
    :return: None
    '''
    X_train,X_test,y_train,y_test=data
    fig=plt.figure()
    ### 测试 degree ####
    degrees=range(1,20)
    train_scores=[]
    test_scores=[]
    for degree in degrees:
        regr=svm.SVR(kernel='poly',degree=degree,coef0=1)
        regr.fit(X_train,y_train)
        train_scores.append(regr.score(X_train,y_train))
        test_scores.append(regr.score(X_test, y_test))
    ax=fig.add_subplot(1,3,1)
    ax.plot(degrees,train_scores,label="Training score ",marker='+' )
    ax.plot(degrees,test_scores,label= " Testing  score ",marker='o' )
    ax.set_title( "SVR_poly_degree r=1")
    ax.set_xlabel("p")
    ax.set_ylabel("score")
    ax.set_ylim(-1,1.)
    ax.legend(loc="best",framealpha=0.5)

    ### 测试 gamma，固定 degree为3， coef0 为 1 ####
    gammas=range(1,40)
    train_scores=[]
    test_scores=[]
    for gamma in gammas:
        regr=svm.SVR(kernel='poly',gamma=gamma,degree=3,coef0=1)
        regr.fit(X_train,y_train)
        train_scores.append(regr.score(X_train,y_train))
        test_scores.append(regr.score(X_test, y_test))
    ax=fig.add_subplot(1,3,2)
    ax.plot(gammas,train_scores,label="Training score ",marker='+' )
    ax.plot(gammas,test_scores,label= " Testing  score ",marker='o' )
    ax.set_title( "SVR_poly_gamma  r=1")
    ax.set_xlabel(r"$\gamma$")
    ax.set_ylabel("score")
    ax.set_ylim(-1,1)
    ax.legend(loc="best",framealpha=0.5)
    ### 测试 r，固定 gamma 为 20，degree为 3 ######
    rs=range(0,20)
    train_scores=[]
    test_scores=[]
    for r in rs:
        regr=svm.SVR(kernel='poly',gamma=20,degree=3,coef0=r)
        regr.fit(X_train,y_train)
        train_scores.append(regr.score(X_train,y_train))
        test_scores.append(regr.score(X_test, y_test))
    ax=fig.add_subplot(1,3,3)
    ax.plot(rs,train_scores,label="Training score ",marker='+' )
    ax.plot(rs,test_scores,label= " Testing  score ",marker='o' )
    ax.set_title( "SVR_poly_r gamma=20 degree=3")
    ax.set_xlabel(r"r")
    ax.set_ylabel("score")
    ax.set_ylim(-1,1.)
    ax.legend(loc="best",framealpha=0.5)
    plt.show()

def test_SVR_rbf(*data):
    '''
    测试 高斯核的 SVR 的预测性能随 gamma 参数的影响

    :param data:
    :return: None
    '''
    X_train,X_test,y_train,y_test=data
    gammas=range(1,20)
    train_scores=[]
    test_scores=[]
    for gamma in gammas:
        regr=svm.SVR(kernel='rbf',gamma=gamma)
        regr.fit(X_train,y_train)
        train_scores.append(regr.score(X_train,y_train))
        test_scores.append(regr.score(X_test, y_test))
    fig=plt.figure()
    ax=fig.add_subplot(1,1,1)
    ax.plot(gammas,train_scores,label="Training score ",marker='+' )
    ax.plot(gammas,test_scores,label= " Testing  score ",marker='o' )
    ax.set_title( "SVR_rbf")
    ax.set_xlabel(r"$\gamma$")
    ax.set_ylabel("score")
    ax.set_ylim(-1,1)
    ax.legend(loc="best",framealpha=0.5)
    plt.show()

def test_SVR_sigmoid(*data):
    '''
    测试 sigmoid 核的 SVR 的预测性能随 gamma、coef0 的影响.

    :param data:
    :return: None
    '''
    X_train,X_test,y_train,y_test=data
    fig=plt.figure()

    ### 测试 gammam，固定 coef0 为 0.01 ####
    gammas=np.logspace(-1,3)
    train_scores=[]
    test_scores=[]

    for gamma in gammas:
        regr=svm.SVR(kernel='sigmoid',gamma=gamma,coef0=0.01)
        regr.fit(X_train,y_train)
        train_scores.append(regr.score(X_train,y_train))
        test_scores.append(regr.score(X_test, y_test))
    ax=fig.add_subplot(1,2,1)
    ax.plot(gammas,train_scores,label="Training score ",marker='+' )
    ax.plot(gammas,test_scores,label= " Testing  score ",marker='o' )
    ax.set_title( "SVR_sigmoid_gamma r=0.01")
    ax.set_xscale("log")
    ax.set_xlabel(r"$\gamma$")
    ax.set_ylabel("score")
    ax.set_ylim(-1,1)
    ax.legend(loc="best",framealpha=0.5)
    ### 测试 r ，固定 gamma 为 10 ######
    rs=np.linspace(0,5)
    train_scores=[]
    test_scores=[]

    for r in rs:
        regr=svm.SVR(kernel='sigmoid',coef0=r,gamma=10)
        regr.fit(X_train,y_train)
        train_scores.append(regr.score(X_train,y_train))
        test_scores.append(regr.score(X_test, y_test))
    ax=fig.add_subplot(1,2,2)
    ax.plot(rs,train_scores,label="Training score ",marker='+' )
    ax.plot(rs,test_scores,label= " Testing  score ",marker='o' )
    ax.set_title( "SVR_sigmoid_r gamma=10")
    ax.set_xlabel(r"r")
    ax.set_ylabel("score")
    ax.set_ylim(-1,1)
    ax.legend(loc="best",framealpha=0.5)
    plt.show()

if __name__=="__main__":
    X_train,X_test,y_train,y_test=load_data_regression() # 生成用于回归问题的数据集
    # test_SVR_linear(X_train,X_test,y_train,y_test) # 调用 test_SVR_linear
    # test_SVR_poly(X_train,X_test,y_train,y_test) # 调用 test_SVR_poly
    # test_SVR_rbf(X_train,X_test,y_train,y_test) # 调用 test_SVR_rbf
    test_SVR_sigmoid(X_train,X_test,y_train,y_test) # 调用 test_SVR_sigmod

输出

调用 test_SVC_linear

    Coefficients:[[ 2.24127622 -0.38128702  7.87018376  5.21135861  2.26619436  1.70869458
      -5.7746489   5.51487251  7.94860817  4.59359657]], intercept [ 137.11012796]
    Score: -0.03

测试多项式核:κ(x,xi)=(γ(xxi+1)+τ)p.

参数p由degree决定
参数γ由gamma决定
参数τ由coef0决定

测试结果如下:

机器学习Chapter-2（支持向量机）

比SVM好点.

测试高斯核:κ(x,xi)=exp(−γ||x−xi||2)

参数γ由gamma决定

测试结果如下:

机器学习Chapter-2（支持向量机）

测试sigmoid核:κ(x,xi)=tanh(γ(xxi+1)+τ)

参数γ由gamma决定
参数τ由coef0决定

测试结果如下:

机器学习Chapter-2（支持向量机）

效果一般。

参考资料

支持向量机通俗导论(理解SVM的三层境界) JULY-Blog

《机器学习》周志华
《Python大战机器学习》华校专

相关标签：机器学习支持向量机 SVM SVR Python

上一篇： 51单片机的超声波智障（避障）小车

下一篇：如何合并这个二维数组？

机器学习Chapter-2（支持向量机）

理论部分

引言

如何找到一条最佳的分割线？

间隔与支持向量

关于点到超平面距离的公式讲解

对偶转化

从线性可分情况转到线性不可分情况

核函数

关于核函数的数学上的直观解释

软间隔与正则化

软间隔数学上的解释

支持向量回归

总结

学习流程图

SVM的优缺点

Python实战下的SVM

数据集

分类问题下的SVM

线性回归SVM

函数原型

程序小结

输出

非线性分类SVM

函数原型

程序小结

输出

回归问题下的SVR

线性回归SVR

函数原型

程序小结

输出

非线性回归SVR

函数原型

程序小结

输出

参考资料

【实战】支持向量机SVM基础实战篇（二）

【《机器学习》第5章神经网络】神经元模型+感知机与多层网络+误差逆传播算法+全局最小与局部最小

libsvm支持向量机回归示例

libsvm支持向量机回归示例

机器学习：支持向量机实现手写识别 博客分类： Opencv 机器学习支持向量机

基于Scikit-Learn、Keras和TensorFlow2支持向量机(Support Vector Machine)

机器学习实战（python3.7也可跑）-支持向量机

支持向量机核心内容

【理论】支持向量机2: Support Vector 介绍支持向量机目标

支持向量机 SVM

机器学习：支持向量机实现手写识别博客分类： Opencv 机器学习支持向量机