计算两个矩阵之间的欧式距离

程序员文章站 2022-07-12 19:55:19

...

在我们使用k-NN模型时，需要计算测试集中每一点到训练集中每一点的欧氏距离，即需要求得两矩阵之间的欧氏距离。在实现k-NN算法时通常有三种方案，分别是使用两层循环，使用一层循环和不使用循环。

使用两层循环

分别对训练集和测试集中的数据进行循环遍历，计算每两个点之间的欧式距离，然后赋值给dist矩阵。此算法没有经过任何优化。

num_test = X.shape[0]
    num_train = self.X_train.shape[0]
    dists = np.zeros((num_test, num_train)) 
    for i in xrange(num_test):
      for j in xrange(num_train):
        #####################################################################
        # TODO:                                                             #
        # Compute the l2 distance between the ith test point and the jth    #
        # training point, and store the result in dists[i, j]. You should   #
        # not use a loop over dimension.                                    #
        #####################################################################
        # pass
        dists[i][j] = np.sqrt(np.sum(np.square(X[i] - self.X_train[j])))
        #####################################################################
        #                       END OF YOUR CODE                            #
        #####################################################################
    return dists

使用一层循环

使用矩阵表示训练集的数据，计算测试集中每一点到训练集矩阵的距离，可以对算法优化为只使用一层循环。

def compute_distances_one_loop(self, X):
    """
    Compute the distance between each test point in X and each training point
    in self.X_train using a single loop over the test data.
    Input / Output: Same as compute_distances_two_loops
    """
    num_test = X.shape[0]
    num_train = self.X_train.shape[0]
    dists = np.zeros((num_test, num_train))
    for i in xrange(num_test):
      #######################################################################
      # TODO:                                                               #
      # Compute the l2 distance between the ith test point and all training #
      # points, and store the result in dists[i, :].                        #
      #######################################################################
      # pass
      dists[i] = np.sqrt(np.sum(np.square(self.X_train - X[i]), axis = 1))
      #######################################################################
      #                         END OF YOUR CODE                            #
      #######################################################################
    return dists

不使用循环

运算效率最高的算法是将训练集和测试集都使用矩阵表示，然后使用矩阵运算的方法替代之前的循环操作。但此操作需要我们对矩阵的运算规则非常熟悉。接下来着重记录如何计算两个矩阵之间的欧式距离。

记录测试集矩阵P的大小为M*D，训练集矩阵C的大小为N*D（测试集*有M个点，每个点为D维特征向量。训练集*有N个点，每个点为D维特征向量）
记 $P_{i}$ 是P的第i行，记 $C_{j}$ 是C的第j行
$P_{i} = [P_{i 1} P_{i 2} \dots P_{i D}]$ $C_{j} = [C_{j 1} C_{j 2} \dots C_{j D}]$

首先计算 $P_{i}$ 和 $C_{j}$ 之间的距离dist(i,j)
$d (P_{i}, C_{j}) = \sqrt{(P_{i 1} - C_{j 1})^{2} + (P_{i 2} - C_{j 2})^{2} + \dots + (P_{i D} - C_{j D})^{2}} = \sqrt{(P_{i 1}^{2} + P_{i 2}^{2} + \dots + P_{i D}^{2}) + (C_{j 1}^{2} + C_{j 2}^{2} + \dots + C_{j D}^{2}) - 2 \times (P_{i 1} C_{j 1} + P_{i 2} C_{j 2} + \dots + P_{i D} C_{i D})} = \sqrt{{‖ P_{i} ‖}^{2} + {‖ C_{j} ‖}^{2} - 2 \times P_{i} C_{j}^{T}}$

我们可以推广到距离矩阵的第i行的计算公式
$d i s t [i] = \sqrt{({‖ P_{i} ‖}^{2} {‖ P_{i} ‖}^{2} \dots {‖ P_{i} ‖}^{2}) + ({‖ C_{1} ‖}^{2} {‖ C_{2} ‖}^{2} \dots {‖ C_{N} ‖}^{2}) - 2 \times P_{i} (C_{1}^{T} C_{2}^{T} \dots C_{N}^{T})} = \sqrt{({‖ P_{i} ‖}^{2} {‖ P_{i} ‖}^{2} \dots {‖ P_{i} ‖}^{2}) + ({‖ C_{1} ‖}^{2} {‖ C_{2} ‖}^{2} \dots {‖ C_{N} ‖}^{2}) - 2 \times P_{i} C^{T}}$

继续将公式推广为整个距离矩阵
$d i s t = \sqrt{(\begin{matrix} {‖ P_{1} ‖}^{2} & {‖ P_{1} ‖}^{2} & \dots & {‖ P_{1} ‖}^{2} \\ {‖ P_{2} ‖}^{2} & {‖ P_{2} ‖}^{2} & \dots & {‖ P_{2} ‖}^{2} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ {‖ P_{M} ‖}^{2} & {‖ P_{M} ‖}^{2} & \dots & {‖ P_{M} ‖}^{2} \end{matrix}) + (\begin{matrix} {‖ C_{1} ‖}^{2} & {‖ C_{2} ‖}^{2} & \dots & {‖ C_{N} ‖}^{2} \\ {‖ C_{1} ‖}^{2} & {‖ C_{2} ‖}^{2} & \dots & {‖ C_{N} ‖}^{2} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ {‖ C_{1} ‖}^{2} & {‖ C_{2} ‖}^{2} & \dots & {‖ C_{N} ‖}^{2} \end{matrix}) - 2 \times P C^{T}}$

表示为python代码：

def compute_distances_no_loops(self, X):
    """
    Compute the distance between each test point in X and each training point
    in self.X_train using no explicit loops.

    Input / Output: Same as compute_distances_two_loops
    """
    num_test = X.shape[0]
    num_train = self.X_train.shape[0]
    dists = np.zeros((num_test, num_train)) 
    #########################################################################
    # TODO:                                                                 #
    # Compute the l2 distance between all test points and all training      #
    # points without using any explicit loops, and store the result in      #
    # dists.                                                                #
    #                                                                       #
    # You should implement this function using only basic array operations; #
    # in particular you should not use functions from scipy.                #
    #                                                                       #
    # HINT: Try to formulate the l2 distance using matrix multiplication    #
    #       and two broadcast sums.                                         #
    #########################################################################
    # pass
    dists = np.sqrt(-2*np.dot(X, self.X_train.T) + np.sum(np.square(self.X_train), axis = 1) + np.transpose([np.sum(np.square(X), axis = 1)]))
    #########################################################################
    #                         END OF YOUR CODE                              #
    #########################################################################
    return dists

上一篇： webpack4.0项目配置

下一篇：二、webpack4.0配置篇

计算两个矩阵之间的欧式距离

使用两层循环

使用一层循环

不使用循环

计算两个经纬度点的实际距离

SQL语句计算两个日期之间有多少个工作日的方法

win10自带的地图应用怎么测量两个地点之间的距离?

PHP计算百度地图两个GPS坐标之间距离的方法

EXCEL 日期差函数DATEDIF计算两个日期之间的天数、月数或年数

数据结构算法分治（两个组的点之间的最小距离）

mysql中两点之间距离的计算

Redis GEODIST 命令 - 返回两个给定位置之间的距离

如何计算两个日期之间的天数

计算矩阵A与矩阵B的欧式距离