Support Vector Machine

程序员文章站 2022-05-21 19:58:00

...

文章目录

Support Vector Machine

What's SVM
The optimization problem Behind the SVM
Soft Margin SVM
Using SVM in the scikit-learn
Using polynomial features and kernel function in SVM
What's kernel function

Polynomial function

RBF kernel function
Gamma in RBF kernel
Solving regression problem by using SVM

Support Vector Machine

What’s SVM

找到一个最优的决策边界，使得两类点中距离决策边界最近的点到边界的距离最大，max margin
$margin=2d$
解决线性可分问题。

Hard Margin SVM

Soft Margin SVM

Support Vector Machine

The optimization problem Behind the SVM

点到直线的距离
$d = \frac{\left | w^{T} \cdot x+b\right |}{\left \| w \right \|},\: \: \left \| w \right \|=\sqrt{w_{1}^{2}+w_{2}^{2}+\cdots +w_{n}^{2}}$

对于分类结果，我们希望有
$\left\{\begin{matrix} \frac{w^{T} \cdot x^{(i)}+b}{\left \| w \right \|}\geq d\: \: \: \forall y^{(i)}=1 \\ \\ \frac{w^{T} \cdot x^{(i)}+b}{\left \| w \right \|}\leq -d\: \: \: \forall y^{(i)}=-1 \end{matrix}\right.$

and

$\left\{\begin{matrix} \frac{w^{T} \cdot x^{(i)}+b}{\left \| w \right \|d}\geq 1\: \: \: &\forall y^{(i)}=1 \\ \\ \frac{w^{T} \cdot x^{(i)}+b}{\left \| w \right \|d}\leq -1\: \: \: &\forall y^{(i)}=-1 \end{matrix}\right.$ The denominator is a constant. so we have
$\left\{\begin{matrix}& w_{d}^{T}\cdot x^{(i)}+b_{d}\geq 1\: \: \: \forall y^{(i)}=1 \\ \\ &w_{d}^{T}\cdot x^{(i)}+b_{d}\leq -1\: \: \: \forall y^{(i)}=-1 \end{matrix}\right.$ In other words, we have
$\left\{\begin{matrix} w^{T}\cdot x^{(i)}+b\geq 1\: \: \: \forall y^{(i)}=1 \\ \\ w^{T}\cdot x^{(i)}+b\leq -1\: \: \: \forall y^{(i)}=-1 \end{matrix}\right.$
Support Vector Machine

In the end, we have
$y^{(i)}(w^{T}\cdot x^{(i)}+b)\geq 1$ so if we want $max \:\:d$ , we just need to $max \:\:\frac{\left | w^{T} \cdot x+b\right |}{\left \| w \right \|}$ . For the points beyond the boundary, the divisor must be
$\left | w^{T}\cdot x^{(i)}+b \right |\geq 1$ So we just need to $max \:\:\frac{1}{\left \| w \right \|}$ , which equals to $min \left \| w \right \|$

In the end, the optimization problem is
$\begin{matrix} min\: \: \frac{1}{2}\left \| w \right \|^{2} \\ \\ s.t.\: \: y^{(i)}(w^{T}\cdot x^{(i)}+b)\geq 1 \end{matrix}$

Soft Margin SVM

In some cases, the data is linear inseparable
$min\: \: \frac{1}{2}\left \| w \right \|^{2}+C\sum_{i=1}^{m}\zeta _{i}\: \: s.t.\: \: y^{(i)}(w^{T}\cdot x^{(i)}+b)\geq 1-\zeta _{i},\: \: \zeta _{i}\geq 0$ we call it the L1 norm. Also, we can use L2 norm.
$min\: \: \frac{1}{2}\left \| w \right \|^{2}+C\sum_{i=1}^{m}\zeta _{i}^{2}\: \: s.t.\: \: y^{(i)}(w^{T}\cdot x^{(i)}+b)\geq 1-\zeta _{i},\: \: \zeta _{i}\geq 0$ The parameter C represents the degree of the fault tolerance.

Using SVM in the scikit-learn

Before we use SVM, we should do data preprocessing, for we need to use the distance

import numpy as np
import matplotlib.pyplot as plt

from sklearn import datasets

iris = datasets.load_iris()

X = iris.data
y = iris.target

X = X[y<2,:2]
y = y[y<2]


plt.scatter(X[y==0,0], X[y==0,1], color='red')
plt.scatter(X[y==1,0], X[y==1,1], color='blue')
plt.show()

from sklearn.preprocessing import StandardScaler

standardScaler = StandardScaler()
standardScaler.fit(X)
X_standard = standardScaler.transform(X)

from sklearn.svm import LinearSVC

svc = LinearSVC(C=1e9)
svc.fit(X_standard, y)


def plot_decision_boundary(model, axis):
    
    x0, x1 = np.meshgrid(
        np.linspace(axis[0], axis[1], int((axis[1]-axis[0])*100)).reshape(-1, 1),
        np.linspace(axis[2], axis[3], int((axis[3]-axis[2])*100)).reshape(-1, 1),
    )
    X_new = np.c_[x0.ravel(), x1.ravel()]

    y_predict = model.predict(X_new)
    zz = y_predict.reshape(x0.shape)

    from matplotlib.colors import ListedColormap
    custom_cmap = ListedColormap(['#EF9A9A','#FFF59D','#90CAF9'])
    
    plt.contourf(x0, x1, zz, linewidth=5, cmap=custom_cmap)

plot_decision_boundary(svc, axis=[-3, 3, -3, 3])
plt.scatter(X_standard[y==0,0], X_standard[y==0,1])
plt.scatter(X_standard[y==1,0], X_standard[y==1,1])
plt.show()

Using polynomial features and kernel function in SVM

…

What’s kernel function

Actually, we can convert the optimization problem to this problem:
$\begin{matrix} max\: \sum_{i=1}^{m}a_{i}-\frac{1}{2}\sum_{i=1}^{m} \sum_{j=1}^{m}a_{i}a_{j}y_{i}y_{j}x_{i}x_{j} \\ \\ s.t.\: \: 0\leqslant a_{i}\leqslant c,\: \: \sum_{i=1}^{m}a_{i}y_{i}=0 \end{matrix}$ In the past, we use the polynomial features to convert the example $x^{(i)}$ to $x^{'(i)} $ , $x^{(j)}$ to $x^{'(j)}$ . Then we calculate the product of the $x^{'(i)} x^{'(j)}$ . But now we want to use a function which the input is $x^{(i)},x^{(j)}$ and the output is the $x^{'(i)}x^{'(j)}$ to calculate the product directly.
$K(x^{(i)},x^{(j)})=x^{'(i)}x^{'(j)}$ It can make our code run faster and occupying less memory. You know, it costs more memory to represent a polynomial features.
As long as the model need to calculate the form like $x_{i}x_{j}$ we can use the kernel function. It’s not only belong to SVM.

Polynomial function

$K(x, y)=(x\cdot y+1)^{2}$ $\begin{aligned} K(x, y)=&(\sum_{i=1}^{n}x_{i}y_{i}+1)^{2} \\ =&\sum_{i=1}^{n}(x_{i}^{2})(y_{i}^{2})+\sum_{i=2}^{n}\sum_{j=1}^{i-1}(\sqrt{2}x_{i}x_{j})(\sqrt{2}y_{i}y_{j})+\sum_{i=1}^{n}(\sqrt{2}x_{i})(\sqrt{2}y_{i})+1 \end{aligned}$ if we define
$x^{'} = (x_{n}^{2},\cdots ,x_{1}^{2},\sqrt{2}x_{n}x_{n-1},\cdots ,\sqrt{2}x_{n},\cdots ,\sqrt{2}x_{1},1)$ $y^{'} = (y_{n}^{2},\cdots ,y_{1}^{2},\sqrt{2}y_{n}y_{n-1},\cdots ,\sqrt{2}y_{n},\cdots ,\sqrt{2}y_{1},1)$ then we have
$K(x,y)=x^{'}y^{'}$ so we directly calculate
$K(x, y)=(x\cdot y+1)^{2}$ instead of calculate the polynomial features $x^{'}y^{'}$
Generally speaking, the kernel function is
$K(x, y)=(x\cdot y+c)^{d}$

RBF kernel function

The Gauss Kernel Function is
$K(x,y)=e^{-\gamma \left \| x-y \right \|^{2}}$ It’s also called RBF(Radial Basis Function) Kernel(径向基函数). 将每一个样本点映射到一个无穷维的特征空间。
高斯核：对于每一个数据点都是landmark，将 $m*n$ 的数据映射成 $m*m$ 的数据。

import numpy as np
import matplotlib.pyplot as plt

x = np.arange(-4, 5, 1)
x

array([-4, -3, -2, -1,  0,  1,  2,  3,  4])

y = np.array((x >= -2) & (x <= 2), dtype='int')
y

array([0, 0, 1, 1, 1, 1, 1, 0, 0])

plt.scatter(x[y==0], [0]*len(x[y==0]))
plt.scatter(x[y==1], [0]*len(x[y==1]))
plt.show()

Support Vector Machine

def gaussian(x, l):
    gamma = 1.0
    return np.exp(-gamma * (x-l)**2)

l1, l2 = -1, 1

X_new = np.empty((len(x), 2))
for i, data in enumerate(x):
    X_new[i, 0] = gaussian(data, l1)
    X_new[i, 1] = gaussian(data, l2)

plt.scatter(X_new[y==0,0], X_new[y==0,1])
plt.scatter(X_new[y==1,0], X_new[y==1,1])
plt.show()

Support Vector Machine

Gamma in RBF kernel

Support Vector Machine

import numpy as np
import matplotlib.pyplot as plt

from sklearn import datasets

X, y = datasets.make_moons(noise=0.15, random_state=666)

plt.scatter(X[y==0,0], X[y==0,1])
plt.scatter(X[y==1,0], X[y==1,1])
plt.show()

Support Vector Machine

from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC

def RBFKernelSVC(gamma):
    return Pipeline([
        ("std_scaler", StandardScaler()),
        ("svc", SVC(kernel="rbf", gamma=gamma))
    ])

svc = RBFKernelSVC(gamma=1)
svc.fit(X, y)

Pipeline(memory=None,
     steps=[('std_scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('svc', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=1, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False))])

def plot_decision_boundary(model, axis):
    
    x0, x1 = np.meshgrid(
        np.linspace(axis[0], axis[1], int((axis[1]-axis[0])*100)).reshape(-1, 1),
        np.linspace(axis[2], axis[3], int((axis[3]-axis[2])*100)).reshape(-1, 1),
    )
    X_new = np.c_[x0.ravel(), x1.ravel()]

    y_predict = model.predict(X_new)
    zz = y_predict.reshape(x0.shape)

    from matplotlib.colors import ListedColormap
    custom_cmap = ListedColormap(['#EF9A9A','#FFF59D','#90CAF9'])
    
    plt.contourf(x0, x1, zz, linewidth=5, cmap=custom_cmap)

l1, l2 = -1, 1

X_new = np.empty((len(x), 2))
for i, data in enumerate(x):
    X_new[i, 0] = gaussian(data, l1)
    X_new[i, 1] = gaussian(data, l2)

plot_decision_boundary(svc, axis=[-1.5, 2.5, -1.0, 1.5])
plt.scatter(X[y==0,0], X[y==0,1])
plt.scatter(X[y==1,0], X[y==1,1])
plt.show()

D:\anaconda\lib\site-packages\matplotlib\contour.py:967: UserWarning: The following kwargs were not used by contour: 'linewidth'
  s)

Support Vector Machine

svc_gamma100 = RBFKernelSVC(gamma=100)
svc_gamma100.fit(X, y)

Pipeline(memory=None,
     steps=[('std_scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('svc', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=100, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False))])

plot_decision_boundary(svc_gamma100, axis=[-1.5, 2.5, -1.0, 1.5])
plt.scatter(X[y==0,0], X[y==0,1])
plt.scatter(X[y==1,0], X[y==1,1])
plt.show()

D:\anaconda\lib\site-packages\matplotlib\contour.py:967: UserWarning: The following kwargs were not used by contour: 'linewidth'
  s)

Support Vector Machine

svc_gamma10 = RBFKernelSVC(gamma=10)
svc_gamma10.fit(X, y)

Pipeline(memory=None,
     steps=[('std_scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('svc', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=10, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False))])

plot_decision_boundary(svc_gamma10, axis=[-1.5, 2.5, -1.0, 1.5])
plt.scatter(X[y==0,0], X[y==0,1])
plt.scatter(X[y==1,0], X[y==1,1])
plt.show()

D:\anaconda\lib\site-packages\matplotlib\contour.py:967: UserWarning: The following kwargs were not used by contour: 'linewidth'
  s)

Support Vector Machine

svc_gamma05 = RBFKernelSVC(gamma=0.5)
svc_gamma05.fit(X, y)

Pipeline(memory=None,
     steps=[('std_scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('svc', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=0.5, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False))])

plot_decision_boundary(svc_gamma05, axis=[-1.5, 2.5, -1.0, 1.5])
plt.scatter(X[y==0,0], X[y==0,1])
plt.scatter(X[y==1,0], X[y==1,1])
plt.show()

D:\anaconda\lib\site-packages\matplotlib\contour.py:967: UserWarning: The following kwargs were not used by contour: 'linewidth'
  s)

Support Vector Machine

svc_gamma01 = RBFKernelSVC(gamma=0.1)
svc_gamma01.fit(X, y)

Pipeline(memory=None,
     steps=[('std_scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('svc', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=0.1, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False))])

plot_decision_boundary(svc_gamma01, axis=[-1.5, 2.5, -1.0, 1.5])
plt.scatter(X[y==0,0], X[y==0,1])
plt.scatter(X[y==1,0], X[y==1,1])
plt.show()

D:\anaconda\lib\site-packages\matplotlib\contour.py:967: UserWarning: The following kwargs were not used by contour: 'linewidth'
  s)

Support Vector Machine

Solving regression problem by using SVM

期望在margin的范围中，所包含的点越多越好

import numpy as np
import matplotlib.pyplot as plt

from sklearn import datasets

boston = datasets.load_boston()
X = boston.data
y = boston.target

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=666)


from sklearn.svm import LinearSVR
from sklearn.svm import SVR
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

def StandardLinearSVR(epsilon=0.1):
    return Pipeline([
        ('std_scaler', StandardScaler()),
        ('linearSVR', LinearSVR(epsilon=epsilon))
    ])

svr = StandardLinearSVR()
svr.fit(X_train, y_train)

svr.score(X_test, y_test)

上一篇： Git教程&Github教程

下一篇：教程

Support Vector Machine

文章目录

Support Vector Machine

What’s SVM

The optimization problem Behind the SVM

Soft Margin SVM

Using SVM in the scikit-learn

Using polynomial features and kernel function in SVM

What’s kernel function

Polynomial function

RBF kernel function

Gamma in RBF kernel

Solving regression problem by using SVM

PHP如何使用Ds\Vector copy()函数？

把一个字符串数组添加到Vector方法的实现

解决eclipse启动时报错Failed to create the Java Virtural Machine.问题的方法

浅析java中ArrayList与Vector的区别以及HashMap与Hashtable的区别

JAVA Vector源码解析和示例代码

解决eclipse启动时报错Failed to create the Java Virtural Machine.问题的方法

Java中Vector与ArrayList的区别详解

浅析java中ArrayList与Vector的区别以及HashMap与Hashtable的区别

Java中Vector与ArrayList的区别详解

Android Support Annotations资料整理