您现在的位置是: 首页

Support Vector Machine

程序员文章站 2022-05-21 19:58:00

Support Vector Machine

What’s SVM

找到一个最优的决策边界,使得两类点中距离决策边界最近的点到边界的距离最大,max margin
margin=2d margin=2d

Hard Margin SVM

Soft Margin SVM

Support Vector Machine

The optimization problem Behind the SVM

d=wTx+bw,  w=w12+w22++wn2 d = \frac{\left | w^{T} \cdot x+b\right |}{\left \| w \right \|},\: \: \left \| w \right \|=\sqrt{w_{1}^{2}+w_{2}^{2}+\cdots +w_{n}^{2}}

{wTx(i)+bwd   y(i)=1wTx(i)+bwd   y(i)=1 \left\{\begin{matrix} \frac{w^{T} \cdot x^{(i)}+b}{\left \| w \right \|}\geq d\: \: \: \forall y^{(i)}=1 \\ \\ \frac{w^{T} \cdot x^{(i)}+b}{\left \| w \right \|}\leq -d\: \: \: \forall y^{(i)}=-1 \end{matrix}\right.


{wTx(i)+bwd1   y(i)=1wTx(i)+bwd1   y(i)=1 \left\{\begin{matrix} \frac{w^{T} \cdot x^{(i)}+b}{\left \| w \right \|d}\geq 1\: \: \: &\forall y^{(i)}=1 \\ \\ \frac{w^{T} \cdot x^{(i)}+b}{\left \| w \right \|d}\leq -1\: \: \: &\forall y^{(i)}=-1 \end{matrix}\right. The denominator is a constant. so we have
{wdTx(i)+bd1   y(i)=1wdTx(i)+bd1   y(i)=1 \left\{\begin{matrix}& w_{d}^{T}\cdot x^{(i)}+b_{d}\geq 1\: \: \: \forall y^{(i)}=1 \\ \\ &w_{d}^{T}\cdot x^{(i)}+b_{d}\leq -1\: \: \: \forall y^{(i)}=-1 \end{matrix}\right. In other words, we have
{wTx(i)+b1   y(i)=1wTx(i)+b1   y(i)=1 \left\{\begin{matrix} w^{T}\cdot x^{(i)}+b\geq 1\: \: \: \forall y^{(i)}=1 \\ \\ w^{T}\cdot x^{(i)}+b\leq -1\: \: \: \forall y^{(i)}=-1 \end{matrix}\right.
Support Vector Machine

In the end, we have
y(i)(wTx(i)+b)1 y^{(i)}(w^{T}\cdot x^{(i)}+b)\geq 1 so if we want max  dmax \:\:d , we just need to max  wTx+bwmax \:\:\frac{\left | w^{T} \cdot x+b\right |}{\left \| w \right \|}. For the points beyond the boundary, the divisor must be
wTx(i)+b1 \left | w^{T}\cdot x^{(i)}+b \right |\geq 1 So we just need to max  1wmax \:\:\frac{1}{\left \| w \right \|}, which equals to minwmin \left \| w \right \|

In the end, the optimization problem is
min  12w2s.t.  y(i)(wTx(i)+b)1 \begin{matrix} min\: \: \frac{1}{2}\left \| w \right \|^{2} \\ \\ s.t.\: \: y^{(i)}(w^{T}\cdot x^{(i)}+b)\geq 1 \end{matrix}

Soft Margin SVM

In some cases, the data is linear inseparable
min  12w2+Ci=1mζi  s.t.  y(i)(wTx(i)+b)1ζi,  ζi0 min\: \: \frac{1}{2}\left \| w \right \|^{2}+C\sum_{i=1}^{m}\zeta _{i}\: \: s.t.\: \: y^{(i)}(w^{T}\cdot x^{(i)}+b)\geq 1-\zeta _{i},\: \: \zeta _{i}\geq 0 we call it the L1 norm. Also, we can use L2 norm.
min  12w2+Ci=1mζi2  s.t.  y(i)(wTx(i)+b)1ζi,  ζi0 min\: \: \frac{1}{2}\left \| w \right \|^{2}+C\sum_{i=1}^{m}\zeta _{i}^{2}\: \: s.t.\: \: y^{(i)}(w^{T}\cdot x^{(i)}+b)\geq 1-\zeta _{i},\: \: \zeta _{i}\geq 0 The parameter C represents the degree of the fault tolerance.

Using SVM in the scikit-learn

Before we use SVM, we should do data preprocessing, for we need to use the distance

import numpy as np
import matplotlib.pyplot as plt

from sklearn import datasets

iris = datasets.load_iris()

X = iris.data
y = iris.target

X = X[y<2,:2]
y = y[y<2]

plt.scatter(X[y==0,0], X[y==0,1], color='red')
plt.scatter(X[y==1,0], X[y==1,1], color='blue')

from sklearn.preprocessing import StandardScaler

standardScaler = StandardScaler()
X_standard = standardScaler.transform(X)

from sklearn.svm import LinearSVC

svc = LinearSVC(C=1e9)
svc.fit(X_standard, y)

def plot_decision_boundary(model, axis):
    x0, x1 = np.meshgrid(
        np.linspace(axis[0], axis[1], int((axis[1]-axis[0])*100)).reshape(-1, 1),
        np.linspace(axis[2], axis[3], int((axis[3]-axis[2])*100)).reshape(-1, 1),
    X_new = np.c_[x0.ravel(), x1.ravel()]

    y_predict = model.predict(X_new)
    zz = y_predict.reshape(x0.shape)

    from matplotlib.colors import ListedColormap
    custom_cmap = ListedColormap(['#EF9A9A','#FFF59D','#90CAF9'])
    plt.contourf(x0, x1, zz, linewidth=5, cmap=custom_cmap)

plot_decision_boundary(svc, axis=[-3, 3, -3, 3])
plt.scatter(X_standard[y==0,0], X_standard[y==0,1])
plt.scatter(X_standard[y==1,0], X_standard[y==1,1])

Using polynomial features and kernel function in SVM

What’s kernel function

Actually, we can convert the optimization problem to this problem:
max&MediumSpace;i=1mai12i=1mj=1maiajyiyjxixjs.t.&MediumSpace;&MediumSpace;0aic,&MediumSpace;&MediumSpace;i=1maiyi=0 \begin{matrix} max\: \sum_{i=1}^{m}a_{i}-\frac{1}{2}\sum_{i=1}^{m} \sum_{j=1}^{m}a_{i}a_{j}y_{i}y_{j}x_{i}x_{j} \\ \\ s.t.\: \: 0\leqslant a_{i}\leqslant c,\: \: \sum_{i=1}^{m}a_{i}y_{i}=0 \end{matrix} In the past, we use the polynomial features to convert the example x(i)x^{(i)} to x(i)x^{&#x27;(i)} ​, x(j)x^{(j)} to x(j)x^{&#x27;(j)}. Then we calculate the product of the x(i)x(j)x^{&#x27;(i)} x^{&#x27;(j)}. But now we want to use a function which the input is x(i),x(j)x^{(i)},x^{(j)} and the output is the x(i)x(j)x^{&#x27;(i)}x^{&#x27;(j)} to calculate the product directly.
K(x(i),x(j))=x(i)x(j) K(x^{(i)},x^{(j)})=x^{&#x27;(i)}x^{&#x27;(j)} It can make our code run faster and occupying less memory. You know, it costs more memory to represent a polynomial features.
As long as the model need to calculate the form like xixjx_{i}x_{j} we can use the kernel function. It’s not only belong to SVM.

Polynomial function

K(x,y)=(xy+1)2 K(x, y)=(x\cdot y+1)^{2} K(x,y)=(i=1nxiyi+1)2=i=1n(xi2)(yi2)+i=2nj=1i1(2xixj)(2yiyj)+i=1n(2xi)(2yi)+1 \begin{aligned} K(x, y)=&amp;(\sum_{i=1}^{n}x_{i}y_{i}+1)^{2} \\ =&amp;\sum_{i=1}^{n}(x_{i}^{2})(y_{i}^{2})+\sum_{i=2}^{n}\sum_{j=1}^{i-1}(\sqrt{2}x_{i}x_{j})(\sqrt{2}y_{i}y_{j})+\sum_{i=1}^{n}(\sqrt{2}x_{i})(\sqrt{2}y_{i})+1 \end{aligned} if we define
x=(xn2,&ThinSpace;,x12,2xnxn1,&ThinSpace;,2xn,&ThinSpace;,2x1,1) x^{&#x27;} = (x_{n}^{2},\cdots ,x_{1}^{2},\sqrt{2}x_{n}x_{n-1},\cdots ,\sqrt{2}x_{n},\cdots ,\sqrt{2}x_{1},1) y=(yn2,&ThinSpace;,y12,2ynyn1,&ThinSpace;,2yn,&ThinSpace;,2y1,1) y^{&#x27;} = (y_{n}^{2},\cdots ,y_{1}^{2},\sqrt{2}y_{n}y_{n-1},\cdots ,\sqrt{2}y_{n},\cdots ,\sqrt{2}y_{1},1) then we have
K(x,y)=xy K(x,y)=x^{&#x27;}y^{&#x27;} so we directly calculate
K(x,y)=(xy+1)2 K(x, y)=(x\cdot y+1)^{2} instead of calculate the polynomial features xyx^{&#x27;}y^{&#x27;}
Generally speaking, the kernel function is
K(x,y)=(xy+c)d K(x, y)=(x\cdot y+c)^{d}

RBF kernel function

The Gauss Kernel Function is
K(x,y)=eγxy2 K(x,y)=e^{-\gamma \left \| x-y \right \|^{2}} It’s also called RBF(Radial Basis Function) Kernel(径向基函数). 将每一个样本点映射到一个无穷维的特征空间。

import numpy as np
import matplotlib.pyplot as plt
x = np.arange(-4, 5, 1)
array([-4, -3, -2, -1,  0,  1,  2,  3,  4])
y = np.array((x >= -2) & (x <= 2), dtype='int')
array([0, 0, 1, 1, 1, 1, 1, 0, 0])
plt.scatter(x[y==0], [0]*len(x[y==0]))
plt.scatter(x[y==1], [0]*len(x[y==1]))

Support Vector Machine

def gaussian(x, l):
    gamma = 1.0
    return np.exp(-gamma * (x-l)**2)
l1, l2 = -1, 1

X_new = np.empty((len(x), 2))
for i, data in enumerate(x):
    X_new[i, 0] = gaussian(data, l1)
    X_new[i, 1] = gaussian(data, l2)
plt.scatter(X_new[y==0,0], X_new[y==0,1])
plt.scatter(X_new[y==1,0], X_new[y==1,1])

Support Vector Machine

Gamma in RBF kernel

Support Vector Machine

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets

X, y = datasets.make_moons(noise=0.15, random_state=666)

plt.scatter(X[y==0,0], X[y==0,1])
plt.scatter(X[y==1,0], X[y==1,1])

Support Vector Machine

from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC

def RBFKernelSVC(gamma):
    return Pipeline([
        ("std_scaler", StandardScaler()),
        ("svc", SVC(kernel="rbf", gamma=gamma))
svc = RBFKernelSVC(gamma=1)
svc.fit(X, y)
     steps=[('std_scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('svc', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=1, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False))])
def plot_decision_boundary(model, axis):
    x0, x1 = np.meshgrid(
        np.linspace(axis[0], axis[1], int((axis[1]-axis[0])*100)).reshape(-1, 1),
        np.linspace(axis[2], axis[3], int((axis[3]-axis[2])*100)).reshape(-1, 1),
    X_new = np.c_[x0.ravel(), x1.ravel()]

    y_predict = model.predict(X_new)
    zz = y_predict.reshape(x0.shape)

    from matplotlib.colors import ListedColormap
    custom_cmap = ListedColormap(['#EF9A9A','#FFF59D','#90CAF9'])
    plt.contourf(x0, x1, zz, linewidth=5, cmap=custom_cmap)
l1, l2 = -1, 1

X_new = np.empty((len(x), 2))
for i, data in enumerate(x):
    X_new[i, 0] = gaussian(data, l1)
    X_new[i, 1] = gaussian(data, l2)
plot_decision_boundary(svc, axis=[-1.5, 2.5, -1.0, 1.5])
plt.scatter(X[y==0,0], X[y==0,1])
plt.scatter(X[y==1,0], X[y==1,1])
D:\anaconda\lib\site-packages\matplotlib\contour.py:967: UserWarning: The following kwargs were not used by contour: 'linewidth'

Support Vector Machine

svc_gamma100 = RBFKernelSVC(gamma=100)
svc_gamma100.fit(X, y)
     steps=[('std_scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('svc', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=100, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False))])
plot_decision_boundary(svc_gamma100, axis=[-1.5, 2.5, -1.0, 1.5])
plt.scatter(X[y==0,0], X[y==0,1])
plt.scatter(X[y==1,0], X[y==1,1])
D:\anaconda\lib\site-packages\matplotlib\contour.py:967: UserWarning: The following kwargs were not used by contour: 'linewidth'

Support Vector Machine

svc_gamma10 = RBFKernelSVC(gamma=10)
svc_gamma10.fit(X, y)
     steps=[('std_scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('svc', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=10, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False))])
plot_decision_boundary(svc_gamma10, axis=[-1.5, 2.5, -1.0, 1.5])
plt.scatter(X[y==0,0], X[y==0,1])
plt.scatter(X[y==1,0], X[y==1,1])
D:\anaconda\lib\site-packages\matplotlib\contour.py:967: UserWarning: The following kwargs were not used by contour: 'linewidth'

Support Vector Machine

svc_gamma05 = RBFKernelSVC(gamma=0.5)
svc_gamma05.fit(X, y)
     steps=[('std_scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('svc', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=0.5, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False))])
plot_decision_boundary(svc_gamma05, axis=[-1.5, 2.5, -1.0, 1.5])
plt.scatter(X[y==0,0], X[y==0,1])
plt.scatter(X[y==1,0], X[y==1,1])
D:\anaconda\lib\site-packages\matplotlib\contour.py:967: UserWarning: The following kwargs were not used by contour: 'linewidth'

Support Vector Machine

svc_gamma01 = RBFKernelSVC(gamma=0.1)
svc_gamma01.fit(X, y)
     steps=[('std_scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('svc', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=0.1, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False))])
plot_decision_boundary(svc_gamma01, axis=[-1.5, 2.5, -1.0, 1.5])
plt.scatter(X[y==0,0], X[y==0,1])
plt.scatter(X[y==1,0], X[y==1,1])
D:\anaconda\lib\site-packages\matplotlib\contour.py:967: UserWarning: The following kwargs were not used by contour: 'linewidth'

Support Vector Machine

Solving regression problem by using SVM


import numpy as np
import matplotlib.pyplot as plt

from sklearn import datasets

boston = datasets.load_boston()
X = boston.data
y = boston.target

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=666)

from sklearn.svm import LinearSVR
from sklearn.svm import SVR
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

def StandardLinearSVR(epsilon=0.1):
    return Pipeline([
        ('std_scaler', StandardScaler()),
        ('linearSVR', LinearSVR(epsilon=epsilon))

svr = StandardLinearSVR()
svr.fit(X_train, y_train)

svr.score(X_test, y_test)

上一篇: Git教程&Github教程

下一篇: 教程