欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

机器学习复习:浅析支持向量机(SVM)

程序员文章站 2022-06-16 23:13:09
...


注意:!!CSDN显示不太正确(latex环境部分不支持)
请移步博客园(同样是本人):浅析支持向量机(SVM)!!


brief introduction

information

​ 支持向量机(Support Vector Machine,以下简称SVM),是一个二元分类( dualistic classification)的广义线性分类器(generalized linear classifier),通过寻找分离超平面作为决策边界(decision boundary),分离少量的支持向量(support vector),从而达到分类目的[1][2][3][1][2][3]

​ 可采用一对一(One Versus One)、一对多(One Versus Rest)等策略转变为多分类问题[6][6]

​ 原问题(primal problem)仅支持硬间隔最大化(hard margin maximum),添加松弛变量(slack variable)后支持软间隔最大化(soft margin maximum)。


details

属性:稀疏性和稳健性(Robust)[1][1]、非参数模型(nonparametric models)、监督学习(supervised learning)、判别模型(discriminant model)、【KKT条件(Karush-Kuhn-Tucker condition)约束,对偶形式(dual form)转换,序列最小优化(Sequential Minimal Optimization,以下简称为SMO)算法求解[1][4][5][1][4][5]】、支持核方法(kernel method)。

求解:使用门页损失函数(hinge loss function)计算经验风险(empirical risk)并在求解时加入了正则化项以优化结构风险(structural risk),1.直接进行二次规划(Quadratic Programming)求解;2.利用拉格朗日算子( Lagrange multipliers),将其转为,符合KKT条件(Karush-Kuhn-Tucker condition)的对偶形式(dual form),再进行二次规划(Quadratic Programming)求解[1][1]

扩展:利用正则化、概率学、结构化、核方法改进算法,包括偏斜数据、概率SVM、最小二乘SVM(Least Square SVM, LS-SVM)、结构化SVM(structured SVM)、多核SVM(multiple kernel SVM);也可扩充到回归(Support Vector Regression)、聚类、半监督学习(Semi-Supervised SVM, S3VM)。


problems

  • generalized formula:
    • 间隔距离(support vector distance):最优解时,值等于2w\frac{2}{\| w \|};
    • 分割线-原点垂直距离: 最优解时,值等于bw\frac{b}{\| w \|};
    • 简单推导:
      • svdistance:{WTxi++b=1WTxi+b=0WTxi+b=1{WT(xixj)=0W(xixj)W(x+x)x+=x+λWWTx++b=1λWTW=2,λ=2WTWmaximize  x+xx+x=λW=2W\color{black}{s\,v\,distance\,:} \begin{aligned} &\begin{cases} W^{\rm{T}}x_i^++b=1\\ W^{\rm{T}}x_i+b=0\\ W^{\rm{T}}x_i^-+b=-1\\ \end{cases} \\ &\downarrow \\ &\begin{cases} W^{\rm{T}}(x_i-x_j)= 0 &\rightarrow W\bot (x_i-x_j)\\ W\| (x^+-x^-) &\rightarrow x^+=x^-+\lambda W\\ W^{\rm{T}}x^++b=1 &\rightarrow \lambda W^{\rm{T}}W=2, \,|\lambda| =\frac{2}{|W^{\rm{T}}W|}\\ \end{cases} \\ &\downarrow \\ &maximize\;\|x^+-x^-\| \\ &\rightarrow \|x^+-x^-\| = \|\lambda W\|=\frac{2}{\|W\|} \end{aligned}

primal problem

当样本数据集线性可分(linear separable)时,寻找正负样本中各自距离对方最近的样本数据(称为支持向量,即SVM的名字由来)(一般为三个),利用相对位置(排除坐标缩放影响,也是采用几何间隔的原因),构建最大间隔(几何间隔)进行分类[3]。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-AyQGUpA6-1583252206630)(SVM.assets/image-20200301163059913.png)]

from:greedyai.comfrom:greedyai.com

如图,当硬间隔最大化时,正类支持向量(positive support vector)(图中x1x_1)与负类支持向量(negative support vector)(图中x2,x3x_2, x_3)使

  • 约束条件:yi(Wxi+b)1y_i(W^{\top}x_i+b)\geq 1
  • 经验风险函数(Hinge loss):[1yi(Wxi+b)]+[1-y_i(W^{\top}x_i+b)]_+
  • 目标函数:min(w,b)12W2min_{(w,b)}\,\frac{1}{2}\|W\|^2
  • 预测:h(x)=sign(Wx+b)h(x)=sign(W^{\top}x+b)

signsign表示符号函数,即计算括号内的值,值为正数则取正类别,反之亦然。

(求解推导请查看下方dual problem内容)


multi-class

  • 多元分类(类似于LR多标签分类的策略)(Ck2>k,  when  k>3C_k^2>k,\;when\;k>3
    • OVR(one versus rest):k元分类问题,训练kk个模型,每个模型二分类为某个类和非该类,预测时选择maxiwiTx  max_i\,w_i^{\rm{T}}x\;xx的类别
    • OVO(one versus one):k元分类问题,训练Ck2C_k^2个模型,每个模型二分类为k元中二元组合,预测时选择计票次数最多的iixx的类别

slack problem

​ 硬间隔最大化无法满足实际条件中,异常值间隔、线性不可分等情况,因此引入软间隔(soft margin)方式: 在原问题基础上,添加松弛变量(slack variable),增加容错率。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-w71XtggB-1583252206632)(SVM.assets/image-20200301171522294.png)]

from:greedyai.comfrom:greedyai.com

  • 约束条件:yi(WTxi+b)1ξi\begin{aligned} &y_i(W^{\rm{T}}x_i+b)\geq 1\color{red}{-\xi_i} \end{aligned}
  • 经验风险函数(Hinge loss):
    yi(WTx+b)1ξi{ξi1yi(WTx+b)ξi0ξi=[1yi(WTx+b)]+\begin{aligned} y_i(W^{\rm{T}}x+b)\geq 1-{\color{red}{\xi_i}} \rightarrow &\begin{cases} {\color{red}{\xi_i}}\geq 1-y_i(W^{\rm{T}}x+b) \\ {\color{red}{\xi_i}}\geq 0 \end{cases}\\ &\downarrow \\ {\color{red}{\xi_i}}=&[1-y_i(W^{\rm{T}}x+b)]_+ \end{aligned}
  • 目标函数:$min_{(w,b,\color{red}{\xi\geq 0})},\frac{1}{2}|W|^2+\color{red}{C\sum_{i}\xi_i} \$
  • 预测:h(x)=sign(WX+b)h(x)=sign(W^{\top}X+b)

(求解推导请查看下方dual problem内容)


Duality

  • 原问题转为对偶形式再求解的好处:[9][9]
    • 把约束条件和待优化目标融合在一个表达式,便于求解;
    • 对偶问题一般是凹函数,便于求全局最优解(global optimal);
    • 对偶形式,便于引入核技巧;

KKT condition

  • 成立条件:

    • f(W)=WTWf(W)=W^{\rm{T}}W is convex,
    • {gi(W)hi(W)=αiTW+b\begin{cases}g_i(W)\\h_i(W)=\alpha_i^{\rm{T}}W+b\end{cases} is affine,
    • W,igi(W)<0\exists W,\,\forall_i g_i(W)<0,
  • 内容(具体问题中有不同表现形式):
    {wiL(w,α,β)=0i=1,,dβiL(w,α,β)=0i=1,,lαgi(w)=0i=1,,kgi(w)0i=1,,kα0i=1,,kα>0,gi(w)=0 \begin{cases} \frac{\partial }{\partial w_i}L(w^\star,\alpha^\star,\beta^\star)=0 & i=1,\dots,d \\ \frac{\partial }{\partial \beta_i}L(w^\star,\alpha^\star,\beta^\star)=0 & i=1,\dots,l \\ {\color{red}{\alpha^\star g_i(w^\star)=0}}& i=1,\dots,k\\ g_i(w^\star)\leq 0 & i=1,\dots,k \\ \alpha^\star \geq 0 & i=1,\dots,k \\ \end{cases} \rightarrow \alpha^\star>0, g_i(w^\star)=0

留意α>0,gi(w)=0\alpha^\star>0, g_i(w^\star)=0,即非支持向量的样本令gi(w)0g_i(w^\star)\neq 0,可得α=0\alpha^\star=0


dual form normal processing

  • 原优化问题:
    KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ &min_w\;f(w) \…

  • 拉格朗日函数(添加α,β\alpha,\beta算子):
    KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ &{\cal{L}(w,\a…

    • 约束情况:
      KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ \theta_{\cal{p…

dual form transformation

primal to dual

(from greedyai.com)

  • 已知: 原问题数学模型

KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ &min_{(w,b)}\,…

  • 改造:符合对偶形式一般流程(dual form normal processing)
    KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ g_i(w)= -y_i(W…

  • 转换:拉格朗日函数
    KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ {\cal{L}(w,\al…

  • 去除:未知值w,bw,b求解
    KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ \nabla_w{\cal{…

  • 代入:将3.1,3.23.1,3.2代入拉格朗日函数
    KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ {\cal{L}(w,b,\…

  • dual问题:
    KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ max_\alpha\;W(…

  • 推导bb:(homework)
    KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ b=-\frac{1}{2}…

  • 决策子:
    KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ f(w) &= W^\top…

  • 补充:当前形式的KKT条件(参考[9][9] -1)
    {αi=0yi(WTx+b)1αi>0yi(WTx+b)=1 \begin{cases} &\alpha_i&=&0 &\Rightarrow &y_i(W^{\rm{T}}x+b) &\geq 1 \\ &\alpha_i&>&0 &\Rightarrow &y_i(W^{\rm{T}}x+b)&= 1 \\ \end{cases}

KKT condition分析,知:

  • 非支持向量的数据样本(sample)可令α=0\alpha=0;即:
    • 若该样本为非支持向量(此时gi(w)0g_i(w)\leq 0),则按学习速率***最小化结构风险***;
    • 若该样本为支持向量(此时gi(w)=0g_i(w)=0),则根据正则化系数***平衡经验风险和结构风险***[1]。

slack to dual

(from greedyai.com)4:19

  • 已知: 原问题数学模型

KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ &min_{(w,b,\co…

  • 改造:符合对偶形式一般流程(dual form normal processing)
    KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ g_i(w)= -y_i(W…

  • 转换:拉格朗日函数
    KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ {\cal{L}(w,\al…

  • 去除:未知值w,b,ξw,b,\xi求解
    KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ \nabla_w{\cal{…

  • 代入:将3.4,3.5,3.63.4,3.5,3.6代入拉格朗日函数
    KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ {\cal{L}(w,b,\…

  • dual问题:
    KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ max_{\alpha}\;…

  • 推导bb:(同上)

  • 决策子:(同上)

  • 补充:当前形式的KKT条件(参考[5][5] -7)
    {αi=0yi(WTx+b)1αi=Cyi(WTx+b)10<αi<Cyi(WTx+b)=1 \begin{cases}&\alpha_i&=&0 &\Rightarrow &y_i(W^{\rm{T}}x+b) &\geq 1 \\&\alpha_i&=&C &\Rightarrow &y_i(W^{\rm{T}}x+b)&\leq 1 \\0&<&\alpha_i&<C&\Rightarrow &y_i(W^{\rm{T}}x+b)&= 1 \\\end{cases}


dual form explanation

由上可得dual 问题,模型:
KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ max_{\alpha}\;…
模型中元素(蓝色、绿色、橙色标注):
KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ max &\begin{ca…

(感觉有点模糊,可能此处反应出SVM模型潜在变量和支持向量的α>0\alpha>0有关)


solving problem

  • 使用Quadratic Programming

coordinate descent method [5][5]

(from greedyai.com)

  • 沿坐标方向(每次仅一个特征变量变化)轮流进行搜索的寻优方法,故又称坐标轮换法;
  • 使用函数值,不使用导数,故是较简单的方法
机器学习复习:浅析支持向量机(SVM)

from [5][5]

  • 迭代公式

    for i in range(n):
        """
        X: sample data matrix
        alpha: Lagrangian multiplier
        d: coordinate direction
        """
        # i^th sample k^th feature
        X[i][k] = X[i-1][k]+alpha[i][k]*d[i][k]
        
        d[i][k] = e[i]
    
  • 收敛依据
    xnkx0kε \|x_n^k-x_0^k\|\leq \varepsilon

  • 通用流程,无约束优化方法——坐标轮换法


Sequential Minimal Optimization [5][5]

CS 229, Autumn 2009 - The Simplified SMO Algorithm, SVM-w-SMO,

(SMO是一种坐标下降法[1][1] )

  • 选择:待固定权重αi,αj\alpha_i,\alpha_j(启发式搜索)

  • 判断:判断权重结果,不符合则重新选择
    KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ &k={\cal{K}}(i…

  • 更新:合适地更新权重αi,αj\alpha_i,\alpha_j
    KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ &\alpha_j^{new…

  • 更新:合适地更新bb
    KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ b=&(b_x+b_y)/2…

  • 收敛:满足KKT条件


kernel method

  • frequent kernel (推荐阅读Kernel Functions for Machine Learning Applications, CSDN-总结一下遇到的各种核函数中前半部分)

    • Polynomial Kernel
      KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ &{\cal{K}}(x_i…

    • Gaussian Kernel(need:feature standardization)
      KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ &{\cal{K}}(x_i…

    • Sigmoid Kernel(equal:NN without hidden layer)
      KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ &{\cal{K}}(x_i…

    • Cosine Similarity Kernel(used:text similarity)
      KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ &{\cal{K}}(x_i…

    • Chi-squared Kernel
      KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ SVM:\;&{\cal{K…

  • Kerel condition
    KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ gram\;&G_{i,j}…

kernel svm

  • 直观模型
    KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ &{\color{gray}…

  • kernel trick

    • 关于正定核KaTeX parse error: Undefined control sequence: \cal at position 2: {\̲c̲a̲l̲{K}_i}的函数可以转为关于另一个正定核KaTeX parse error: Undefined control sequence: \cal at position 2: {\̲c̲a̲l̲{K}_j}的函数

    • 预测:
      KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ {\color{gray}{…

    • 高维映射(空间定义可参看B站-3Blue1Brown-微积分的本质)
      KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ polynomial\;ke…

  • kernel & basis expansion(compared)


#.reference links

重点推荐[5][6][10][5][6][10]

相关标签: 机器学习复习