欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

Support Vector Machine

程序员文章站 2022-05-21 19:58:00
...

Support Vector Machine

What’s SVM

找到一个最优的决策边界,使得两类点中距离决策边界最近的点到边界的距离最大,max margin
margin=2d margin=2d
解决线性可分问题。

Hard Margin SVM

Soft Margin SVM

Support Vector Machine

The optimization problem Behind the SVM

点到直线的距离
d=wTx+bw,  w=w12+w22++wn2 d = \frac{\left | w^{T} \cdot x+b\right |}{\left \| w \right \|},\: \: \left \| w \right \|=\sqrt{w_{1}^{2}+w_{2}^{2}+\cdots +w_{n}^{2}}

对于分类结果,我们希望有
{wTx(i)+bwd   y(i)=1wTx(i)+bwd   y(i)=1 \left\{\begin{matrix} \frac{w^{T} \cdot x^{(i)}+b}{\left \| w \right \|}\geq d\: \: \: \forall y^{(i)}=1 \\ \\ \frac{w^{T} \cdot x^{(i)}+b}{\left \| w \right \|}\leq -d\: \: \: \forall y^{(i)}=-1 \end{matrix}\right.

and

{wTx(i)+bwd1   y(i)=1wTx(i)+bwd1   y(i)=1 \left\{\begin{matrix} \frac{w^{T} \cdot x^{(i)}+b}{\left \| w \right \|d}\geq 1\: \: \: &\forall y^{(i)}=1 \\ \\ \frac{w^{T} \cdot x^{(i)}+b}{\left \| w \right \|d}\leq -1\: \: \: &\forall y^{(i)}=-1 \end{matrix}\right. The denominator is a constant. so we have
{wdTx(i)+bd1   y(i)=1wdTx(i)+bd1   y(i)=1 \left\{\begin{matrix}& w_{d}^{T}\cdot x^{(i)}+b_{d}\geq 1\: \: \: \forall y^{(i)}=1 \\ \\ &w_{d}^{T}\cdot x^{(i)}+b_{d}\leq -1\: \: \: \forall y^{(i)}=-1 \end{matrix}\right. In other words, we have
{wTx(i)+b1   y(i)=1wTx(i)+b1   y(i)=1 \left\{\begin{matrix} w^{T}\cdot x^{(i)}+b\geq 1\: \: \: \forall y^{(i)}=1 \\ \\ w^{T}\cdot x^{(i)}+b\leq -1\: \: \: \forall y^{(i)}=-1 \end{matrix}\right.
Support Vector Machine

In the end, we have
y(i)(wTx(i)+b)1 y^{(i)}(w^{T}\cdot x^{(i)}+b)\geq 1 so if we want max  dmax \:\:d , we just need to max  wTx+bwmax \:\:\frac{\left | w^{T} \cdot x+b\right |}{\left \| w \right \|}. For the points beyond the boundary, the divisor must be
wTx(i)+b1 \left | w^{T}\cdot x^{(i)}+b \right |\geq 1 So we just need to max  1wmax \:\:\frac{1}{\left \| w \right \|}, which equals to minwmin \left \| w \right \|

In the end, the optimization problem is
min  12w2s.t.  y(i)(wTx(i)+b)1 \begin{matrix} min\: \: \frac{1}{2}\left \| w \right \|^{2} \\ \\ s.t.\: \: y^{(i)}(w^{T}\cdot x^{(i)}+b)\geq 1 \end{matrix}

Soft Margin SVM

In some cases, the data is linear inseparable
min  12w2+Ci=1mζi  s.t.  y(i)(wTx(i)+b)1ζi,  ζi0 min\: \: \frac{1}{2}\left \| w \right \|^{2}+C\sum_{i=1}^{m}\zeta _{i}\: \: s.t.\: \: y^{(i)}(w^{T}\cdot x^{(i)}+b)\geq 1-\zeta _{i},\: \: \zeta _{i}\geq 0 we call it the L1 norm. Also, we can use L2 norm.
min  12w2+Ci=1mζi2  s.t.  y(i)(wTx(i)+b)1ζi,  ζi0 min\: \: \frac{1}{2}\left \| w \right \|^{2}+C\sum_{i=1}^{m}\zeta _{i}^{2}\: \: s.t.\: \: y^{(i)}(w^{T}\cdot x^{(i)}+b)\geq 1-\zeta _{i},\: \: \zeta _{i}\geq 0 The parameter C represents the degree of the fault tolerance.

Using SVM in the scikit-learn

Before we use SVM, we should do data preprocessing, for we need to use the distance

import numpy as np
import matplotlib.pyplot as plt

from sklearn import datasets

iris = datasets.load_iris()

X = iris.data
y = iris.target

X = X[y<2,:2]
y = y[y<2]


plt.scatter(X[y==0,0], X[y==0,1], color='red')
plt.scatter(X[y==1,0], X[y==1,1], color='blue')
plt.show()

from sklearn.preprocessing import StandardScaler

standardScaler = StandardScaler()
standardScaler.fit(X)
X_standard = standardScaler.transform(X)

from sklearn.svm import LinearSVC

svc = LinearSVC(C=1e9)
svc.fit(X_standard, y)


def plot_decision_boundary(model, axis):
    
    x0, x1 = np.meshgrid(
        np.linspace(axis[0], axis[1], int((axis[1]-axis[0])*100)).reshape(-1, 1),
        np.linspace(axis[2], axis[3], int((axis[3]-axis[2])*100)).reshape(-1, 1),
    )
    X_new = np.c_[x0.ravel(), x1.ravel()]

    y_predict = model.predict(X_new)
    zz = y_predict.reshape(x0.shape)

    from matplotlib.colors import ListedColormap
    custom_cmap = ListedColormap(['#EF9A9A','#FFF59D','#90CAF9'])
    
    plt.contourf(x0, x1, zz, linewidth=5, cmap=custom_cmap)

plot_decision_boundary(svc, axis=[-3, 3, -3, 3])
plt.scatter(X_standard[y==0,0], X_standard[y==0,1])
plt.scatter(X_standard[y==1,0], X_standard[y==1,1])
plt.show()

Using polynomial features and kernel function in SVM

What’s kernel function

Actually, we can convert the optimization problem to this problem:
max&MediumSpace;i=1mai12i=1mj=1maiajyiyjxixjs.t.&MediumSpace;&MediumSpace;0aic,&MediumSpace;&MediumSpace;i=1maiyi=0 \begin{matrix} max\: \sum_{i=1}^{m}a_{i}-\frac{1}{2}\sum_{i=1}^{m} \sum_{j=1}^{m}a_{i}a_{j}y_{i}y_{j}x_{i}x_{j} \\ \\ s.t.\: \: 0\leqslant a_{i}\leqslant c,\: \: \sum_{i=1}^{m}a_{i}y_{i}=0 \end{matrix} In the past, we use the polynomial features to convert the example x(i)x^{(i)} to x(i)x^{&#x27;(i)} ​, x(j)x^{(j)} to x(j)x^{&#x27;(j)}. Then we calculate the product of the x(i)x(j)x^{&#x27;(i)} x^{&#x27;(j)}. But now we want to use a function which the input is x(i),x(j)x^{(i)},x^{(j)} and the output is the x(i)x(j)x^{&#x27;(i)}x^{&#x27;(j)} to calculate the product directly.
K(x(i),x(j))=x(i)x(j) K(x^{(i)},x^{(j)})=x^{&#x27;(i)}x^{&#x27;(j)} It can make our code run faster and occupying less memory. You know, it costs more memory to represent a polynomial features.
As long as the model need to calculate the form like xixjx_{i}x_{j} we can use the kernel function. It’s not only belong to SVM.

Polynomial function

K(x,y)=(xy+1)2 K(x, y)=(x\cdot y+1)^{2} K(x,y)=(i=1nxiyi+1)2=i=1n(xi2)(yi2)+i=2nj=1i1(2xixj)(2yiyj)+i=1n(2xi)(2yi)+1 \begin{aligned} K(x, y)=&amp;(\sum_{i=1}^{n}x_{i}y_{i}+1)^{2} \\ =&amp;\sum_{i=1}^{n}(x_{i}^{2})(y_{i}^{2})+\sum_{i=2}^{n}\sum_{j=1}^{i-1}(\sqrt{2}x_{i}x_{j})(\sqrt{2}y_{i}y_{j})+\sum_{i=1}^{n}(\sqrt{2}x_{i})(\sqrt{2}y_{i})+1 \end{aligned} if we define
x=(xn2,&ThinSpace;,x12,2xnxn1,&ThinSpace;,2xn,&ThinSpace;,2x1,1) x^{&#x27;} = (x_{n}^{2},\cdots ,x_{1}^{2},\sqrt{2}x_{n}x_{n-1},\cdots ,\sqrt{2}x_{n},\cdots ,\sqrt{2}x_{1},1) y=(yn2,&ThinSpace;,y12,2ynyn1,&ThinSpace;,2yn,&ThinSpace;,2y1,1) y^{&#x27;} = (y_{n}^{2},\cdots ,y_{1}^{2},\sqrt{2}y_{n}y_{n-1},\cdots ,\sqrt{2}y_{n},\cdots ,\sqrt{2}y_{1},1) then we have
K(x,y)=xy K(x,y)=x^{&#x27;}y^{&#x27;} so we directly calculate
K(x,y)=(xy+1)2 K(x, y)=(x\cdot y+1)^{2} instead of calculate the polynomial features xyx^{&#x27;}y^{&#x27;}
Generally speaking, the kernel function is
K(x,y)=(xy+c)d K(x, y)=(x\cdot y+c)^{d}

RBF kernel function

The Gauss Kernel Function is
K(x,y)=eγxy2 K(x,y)=e^{-\gamma \left \| x-y \right \|^{2}} It’s also called RBF(Radial Basis Function) Kernel(径向基函数). 将每一个样本点映射到一个无穷维的特征空间。
高斯核:对于每一个数据点都是landmark,将mnm*n的数据映射成mmm*m的数据。

import numpy as np
import matplotlib.pyplot as plt
x = np.arange(-4, 5, 1)
x
array([-4, -3, -2, -1,  0,  1,  2,  3,  4])
y = np.array((x >= -2) & (x <= 2), dtype='int')
y
array([0, 0, 1, 1, 1, 1, 1, 0, 0])
plt.scatter(x[y==0], [0]*len(x[y==0]))
plt.scatter(x[y==1], [0]*len(x[y==1]))
plt.show()

Support Vector Machine

def gaussian(x, l):
    gamma = 1.0
    return np.exp(-gamma * (x-l)**2)
l1, l2 = -1, 1

X_new = np.empty((len(x), 2))
for i, data in enumerate(x):
    X_new[i, 0] = gaussian(data, l1)
    X_new[i, 1] = gaussian(data, l2)
plt.scatter(X_new[y==0,0], X_new[y==0,1])
plt.scatter(X_new[y==1,0], X_new[y==1,1])
plt.show()

Support Vector Machine

Gamma in RBF kernel

Support Vector Machine

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets

X, y = datasets.make_moons(noise=0.15, random_state=666)

plt.scatter(X[y==0,0], X[y==0,1])
plt.scatter(X[y==1,0], X[y==1,1])
plt.show()

Support Vector Machine

from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC

def RBFKernelSVC(gamma):
    return Pipeline([
        ("std_scaler", StandardScaler()),
        ("svc", SVC(kernel="rbf", gamma=gamma))
    ])
svc = RBFKernelSVC(gamma=1)
svc.fit(X, y)
Pipeline(memory=None,
     steps=[('std_scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('svc', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=1, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False))])
def plot_decision_boundary(model, axis):
    
    x0, x1 = np.meshgrid(
        np.linspace(axis[0], axis[1], int((axis[1]-axis[0])*100)).reshape(-1, 1),
        np.linspace(axis[2], axis[3], int((axis[3]-axis[2])*100)).reshape(-1, 1),
    )
    X_new = np.c_[x0.ravel(), x1.ravel()]

    y_predict = model.predict(X_new)
    zz = y_predict.reshape(x0.shape)

    from matplotlib.colors import ListedColormap
    custom_cmap = ListedColormap(['#EF9A9A','#FFF59D','#90CAF9'])
    
    plt.contourf(x0, x1, zz, linewidth=5, cmap=custom_cmap)
l1, l2 = -1, 1

X_new = np.empty((len(x), 2))
for i, data in enumerate(x):
    X_new[i, 0] = gaussian(data, l1)
    X_new[i, 1] = gaussian(data, l2)
plot_decision_boundary(svc, axis=[-1.5, 2.5, -1.0, 1.5])
plt.scatter(X[y==0,0], X[y==0,1])
plt.scatter(X[y==1,0], X[y==1,1])
plt.show()
D:\anaconda\lib\site-packages\matplotlib\contour.py:967: UserWarning: The following kwargs were not used by contour: 'linewidth'
  s)

Support Vector Machine

svc_gamma100 = RBFKernelSVC(gamma=100)
svc_gamma100.fit(X, y)
Pipeline(memory=None,
     steps=[('std_scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('svc', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=100, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False))])
plot_decision_boundary(svc_gamma100, axis=[-1.5, 2.5, -1.0, 1.5])
plt.scatter(X[y==0,0], X[y==0,1])
plt.scatter(X[y==1,0], X[y==1,1])
plt.show()
D:\anaconda\lib\site-packages\matplotlib\contour.py:967: UserWarning: The following kwargs were not used by contour: 'linewidth'
  s)

Support Vector Machine

svc_gamma10 = RBFKernelSVC(gamma=10)
svc_gamma10.fit(X, y)
Pipeline(memory=None,
     steps=[('std_scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('svc', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=10, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False))])
plot_decision_boundary(svc_gamma10, axis=[-1.5, 2.5, -1.0, 1.5])
plt.scatter(X[y==0,0], X[y==0,1])
plt.scatter(X[y==1,0], X[y==1,1])
plt.show()
D:\anaconda\lib\site-packages\matplotlib\contour.py:967: UserWarning: The following kwargs were not used by contour: 'linewidth'
  s)

Support Vector Machine

svc_gamma05 = RBFKernelSVC(gamma=0.5)
svc_gamma05.fit(X, y)
Pipeline(memory=None,
     steps=[('std_scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('svc', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=0.5, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False))])
plot_decision_boundary(svc_gamma05, axis=[-1.5, 2.5, -1.0, 1.5])
plt.scatter(X[y==0,0], X[y==0,1])
plt.scatter(X[y==1,0], X[y==1,1])
plt.show()
D:\anaconda\lib\site-packages\matplotlib\contour.py:967: UserWarning: The following kwargs were not used by contour: 'linewidth'
  s)

Support Vector Machine

svc_gamma01 = RBFKernelSVC(gamma=0.1)
svc_gamma01.fit(X, y)
Pipeline(memory=None,
     steps=[('std_scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('svc', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=0.1, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False))])
plot_decision_boundary(svc_gamma01, axis=[-1.5, 2.5, -1.0, 1.5])
plt.scatter(X[y==0,0], X[y==0,1])
plt.scatter(X[y==1,0], X[y==1,1])
plt.show()
D:\anaconda\lib\site-packages\matplotlib\contour.py:967: UserWarning: The following kwargs were not used by contour: 'linewidth'
  s)

Support Vector Machine

Solving regression problem by using SVM

期望在margin的范围中,所包含的点越多越好

import numpy as np
import matplotlib.pyplot as plt

from sklearn import datasets

boston = datasets.load_boston()
X = boston.data
y = boston.target

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=666)


from sklearn.svm import LinearSVR
from sklearn.svm import SVR
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

def StandardLinearSVR(epsilon=0.1):
    return Pipeline([
        ('std_scaler', StandardScaler()),
        ('linearSVR', LinearSVR(epsilon=epsilon))
    ])

svr = StandardLinearSVR()
svr.fit(X_train, y_train)

svr.score(X_test, y_test)

上一篇: Git教程&Github教程

下一篇: 教程