关于LogisticRegression及其应用

程序员文章站 2022-07-06 11:08:16

...

1.算法简介

Logistic回归是在线性回归基础上，针对线性回归的缺陷（误差统计独立假设，从属关系函数非概率）进行改进后的算法。它将目标变量进行转换后，在此基础上建立线性模型。变换过程为logit transformation：logit(pro) = log(pro) / (1 - log(pro))。其逆变换为sigmoid transformation：sigm(x) = 1 / (1 + e ^ (-x))，以及结果模型。

之后确定下估价函数x中线性表达到参数即可：x = sigma(w[i] * a[i])，其中w为带确定权值，a为属性值。对于给定的训练样例，或者是待预测样例，通过估价函数给出估值，再经过sigmoid变换将R上到估值转换为[0, 1]上到概率值。

以上为logistc回归的基本思想。在实验中，我们使用logistic回归来处理分类问题。如何实现呢？

事实上，任何回归技术，无论是线性到还是非线性的，都可以用来分类。技巧是对每一个类执行一个回归，是属于该类的训练实例的输出结果为1，而不属于该类到输出结果为0。得到各类到回归函数（亦称为从属关系函数，membership function），对于给定的未知类的测试实例，计算每个回归估价，并选择其中最大到作为分类结果。这种方法有时称为多反馈回归（multi-response regression）。

2.函数库api

使用numpy, matplotlib, sklearn 可以在python上实现大多机器学习算法。numpy 为python的科学计算库，主要提供数据结构支持；matplotlib 为可视化函数库，有出色的显像功能；sklearn 为机器学习算法库，提供丰富的算法接口。

Logistic Regression是sklearn.linear_model下的一个算法（类）。主要api如下：

Attributes:

coef_
intercept_

注*：coef(ficients)为所确定的参数，intercept为常数参数

Methods:

fit(X, y[, sample_weight])
predict(X)
score(X, y[, sample_weight])

注*：fit为对给定的训练实例及其结果建立回归模型，predict再模型上进行预测，score为训练实例得分（按准确率计算）。注意这里的回归模型直接给出分类，而不是每个类的概率值。

3.算法应用

二分类问题
多分类问题

1.二分类问题
logistic regression 可以在回归模型下直接处理二分类问题。由于二分类中只有两类结果，非此即彼，因此并不需要建立两个成员函数。直接将回归值与0.5做比较，大于0.5的分为一类，否则分为另外一类。

在下面的实验中，通过正态分布生成数据，并加入高斯噪音，作为实验数据。

import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# Synthesize the normal distributed data with Gaussian noise
n_samples = 30
np.random.seed(0)
X = np.random.normal(size=n_samples)
X[X > 0] *= 4
X += .3 * np.random.normal(size=n_samples)
'''
The author put the classification before noise added.
What's the difference and why?
'''
Y = (X > 0).astype(np.float)
X_test = np.linspace(-5, 10, 300)

# Initialize the classifier
X = X[:, np.newaxis]
logreg = linear_model.LogisticRegression(C=1e5)
logreg.fit(X, Y)

# Visualization
plt.figure(1, figsize=(8, 6))
plt.ylabel('Class')
plt.xlabel('X')
plt.xticks(range(-5, 10))
plt.yticks([0, 0.5, 1])
plt.ylim(-.25, 1.25)
plt.xlim(-4, 10)
plt.scatter(X.ravel(), Y, c='k')
coef = logreg.coef_.ravel()[0]
intercept = logreg.intercept_.ravel()[0]
plt.plot(X_test, sigmoid(coef * X_test + intercept), label='Regression Curve')
plt.axhline(.5, c='.5', linestyle='--', label='Decision Boundary')
font = {
    'family' : 'serif',
    'color' : 'k',
    'weight' : 'normal',
    'size' : 12
}
plt.text(5, 0.25,
         'Reg(x) = Sigmoid(%.2f * x + %.2f)'%(coef, intercept),
         horizontalalignment='center', fontdict=font)
plt.legend()
plt.show()

关于LogisticRegression及其应用

2.多分类问题
导入sklearn数据集iris，选取前两个特征作为实验数据。

from sklearn import datasets
from sklearn import linear_model
import numpy as np
import matplotlib.pyplot as plt

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# Load dataset and pick up the needed
iris = datasets.load_iris()
X = iris.data[:, :2]
Y = iris.target
B = iris.feature_names[:2]

# Initialize the classifier
logreg = linear_model.LogisticRegression(C=1e5)
logreg.fit(X, Y)
coef = logreg.coef_
intercept = logreg.intercept_

# Matrix build
res = .05
x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
xx, yy = np.meshgrid(np.arange(x_min, x_max, res), np.arange(y_min, y_max, res))
Z = logreg.predict(np.c_[xx.ravel(), yy.ravel()])
print(logreg.score(X, Y))
Z = Z.reshape(xx.shape)

# Visualization
plt.figure(1, figsize=(8, 6))
plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max)
plt.xlabel(B[0])
plt.ylabel(B[1])
plt.pcolormesh(xx, yy, Z, cmap=plt.cm.Set1)
plt.scatter(X[:, 0], X[:, 1], c=Y, edgecolors='k', cmap=plt.cm.Set1)
plt.show()

关于LogisticRegression及其应用

使用matplotlib Axes3D可以清楚的看到logistc函数在分类中的作用。

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn import linear_model
from mpl_toolkits.mplot3d import Axes3D

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def linear_com(data, c, i):
    return data[0] * c[0] + data[1] * c[1] + i

# Load dataset and pick up the needed
iris = datasets.load_iris()
X = iris.data[:, :2]
Y = iris.target
B = iris.feature_names[:2]

# Initialize the classifier
logreg = linear_model.LogisticRegression(C=1e5)
logreg.fit(X, Y)
coef = logreg.coef_
intercept = logreg.intercept_

# Matrix build
res = .1
x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
xx, yy = np.meshgrid(np.arange(x_min, x_max, res), np.arange(y_min, y_max, res))
xi = xx.ravel()
yi = yy.ravel()
tot = len(xi)
val = []
for i in range(3):
    tmp = []
    for p in range(tot):
        tmp.append(sigmoid(linear_com([xi[p], yi[p]], coef[i], intercept[i])))
    tmp = np.array(tmp)
    tmp = tmp.reshape(xx.shape)
    val.append(tmp)
val = np.array(val)

# Visualization
fig = plt.figure(1, figsize=(15, 4))
for i in range(3):
    ax = fig.add_subplot(1, 3, i + 1, projection='3d')
    ax.plot_surface(xx, yy, val[i], cmap=plt.cm.coolwarm, linewidth=0, antialiased=False)
    ax.set_xlabel(B[0])
    ax.set_ylabel(B[1])
    ax.set_zlabel('possibility of type ' + str(i))
plt.show()

关于LogisticRegression及其应用

上一篇： Lesson 3

下一篇： LogisticRegression

关于LogisticRegression及其应用

1.算法简介

2.函数库api

Attributes:

Methods:

3.算法应用

关于.NET动态代理的介绍和应用简介

WiMax无线网络的技术以及其应用前景介绍

关于PHP递归算法和应用方法介绍

ES6知识点整理之函数对象参数默认值及其解构应用示例

ES6知识点整理之函数数组参数的默认值及其解构应用示例

一个关于最基本的商业logo设计应用参考

关于knockout下拉多选值的应用

Python 列表及其应用

Linux下压缩工具gzip和归档工具tar及其实战shell应用

关于AOP在JS中的实现与应用详解