欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

线性回归的N种姿势

程序员文章站 2022-05-22 13:06:14
...

1. Naive Solution

We want to find x^ that minimizes:

||Ax^b||2

Another way to think about this is that we are interested in where vector b is closest to the subspace spanned by A (called the range of A). Ax^ is the projection of b onto A. Since bAx^ must be perpendicular to the subspace spanned by A, we see that

AT(bAx^)=0

线性回归的N种姿势
(we are using AT because we want to multiply each column of A by bAx^

This leads us to the normal equations:

x=(ATA)1ATb

但是这种方法不是很practical,因为要求矩阵的逆,比较麻烦。

def ls_naive(A, b):
    # not practical,因为直接求了矩阵的逆,很麻烦
     return np.linalg.inv(A.T @ A) @ A.T @ b

coeffs_naive = ls_naive(trn_int, y_trn)
regr_metrics(y_test, test_int @ coeffs_naive)

2. Cholesky Factorization

Normal equations:

ATAx=ATb

If A has full rank, the pseudo-inverse (ATA)1AT is a square, hermitian positive definite matrix.

【注意】Cholesky一定要是正定矩阵!虽然这个操作又快又好,但是使用场景有限!

The standard way of solving such a system is Cholesky Factorization, which finds upper-triangular R such that ATA=RTR. ( RT is lower-triangular )

AtA = A.T @ A
Atb = A.T @ b

R = scipy.linalg.cholesky(AtA)

# check our factorization
np.linalg.norm(AtA - R.T @ R)

Q:为啥变成 RTRx=ATb 会变简单?

之前提过,上三角矩阵解线性方程组时会简单许多(最后一行只有一个未知数)。

所以先求 RTw=ATb 再求 Rx=w 会easier。

ATAx=ATbRTRx=ATbRTw=ATbRx=w

def ls_chol(A, b):
    R = scipy.linalg.cholesky(A.T @ A)
    w = scipy.linalg.solve_triangular(R, A.T @ b, trans='T')
    return scipy.linalg.solve_triangular(R, w)

%timeit coeffs_chol = ls_chol(trn_int, y_trn)

3. QR Factorization

Q is orthonormal, R is upper-triangular.

Ax=bA=QRQRx=bQ1=QTQTQRx=QTbRx=QTb

Then once again, Rx is easier to solve than Ax.

def ls_qr(A,b):
    Q, R = scipy.linalg.qr(A, mode='economic')
    return scipy.linalg.solve_triangular(R, Q.T @ b)

coeffs_qr = ls_qr(trn_int, y_trn)
regr_metrics(y_test, test_int @ coeffs_qr)

4. SVD

都是同样的问题,要求 xU 的column orthonormal, 是对角阵,V 是row orthonormal.

Ax=bA=UΣVΣVx=UTbΣw=UTbx=VTw

Σw=UTb 甚至比上三角矩阵 R 更简单(每行只有一个未知量)。然后 Vx=w 又可以直接用orthonormal的性质,直接等于 VTw

SVD gives the pseudo-inverse.

def ls_svd(A,b):
    m, n = A.shape
    U, sigma, Vh = scipy.linalg.svd(A, full_matrices=False, lapack_driver='gesdd')
    w = (U.T @ b)/ sigma
    return Vh.T @ w

coeffs_svd = ls_svd(trn_int, y_trn)
regr_metrics(y_test, test_int @ coeffs_svd)

5. Timing Comparison

Matrix Inversion: 2n3

Matrix Multiplication: n3

Cholesky: 13n3

QR, Gram Schmidt: 2mn2, mn (chapter 8 of Trefethen)

QR, Householder: 2mn223n3 (chapter 10 of Trefethen)

Solving a triangular system: n2

线性回归的N种姿势
Why Cholesky Factorization is Fast:
线性回归的N种姿势
(source: Stanford Convex Optimization: Numerical Linear Algebra Background Slides)

相关标签: 线性代数