object detection中的mAP指标

程序员文章站 2024-03-14 21:21:41

...

https://medium.com/@jonathan_hui/map-mean-average-precision-for-object-detection-45c121a31173

https://github.com/rafaelpadilla/Object-Detection-Metrics

https://papers.nips.cc/paper/5867-precision-recall-gain-curves-pr-analysis-done-right

将待评估的数据集送入网络模型后，得到输出并进行NMS操作处理之后，得到all_boxes，即对于输入数据集中的每张图像得到的检测结果。

def evaluate_detections(self, all_boxes, output_dir):
    '''
    :param all_boxes: 对于当前数据集检测到的所有bounding boxes 是一个二维列表
           size num_classes*num_images
                列表中的每个元素表示当前类别下当前输入图像中所检测到的bounding boxes总数
                列表中的每个元素是一个numpy.ndarray  shape [num_boxes,5]
                其中前4列表示当前输入图像中属于当前前景类别的bounding boxes预测的4个坐标值
                最后一列表示对于当前prediction bounding boxes所预测的confidence score
    :param output_dir:设置一个输出文件夹路径（绝对路径）
    :return:
    '''
    self._write_voc_results_file(all_boxes)
    '''
    对于当前输入图像数据集所预测出来的所有类别的所有bounding boxes，按照类别分别将
    每张输入图像的bounding boxes存放到一个txt文件夹中
    {class_name}.txt文本文件中的每一行表示：
    图像名（不包含后缀名） confidence score xmin ymin xmax ymax
    表示某一张输入图像预测为当前类别的bounding boxes的坐标值和置信度分数
    如果某张输入图像的预测结果中并不包含某个前景类别的bounding boxes，则不向txt文件中写入一行
    '''
    self._do_python_eval(output_dir)
    if self.config['matlab_eval']:
        self._do_matlab_eval(output_dir)
    if self.config['cleanup']:
        for cls in self._classes:
            if cls == '__background__':
                continue
            filename = self._get_voc_results_file_template().format(cls)
            os.remove(filename)

def _write_voc_results_file(self, all_boxes):
    for cls_ind, cls in enumerate(self.classes):
        if cls == '__background__':
            continue
        print('Writing {} VOC results file'.format(cls))
        filename = self._get_voc_results_file_template().format(cls)
        with open(filename, 'wt') as f:
            for im_ind, index in enumerate(self.image_index):
                dets = all_boxes[cls_ind][im_ind]
                if dets == []:
                    continue
                # the VOCdevkit expects 1-based indices
                for k in range(dets.shape[0]):
                    f.write(
                        '{:s} {:.3f} {:.1f} {:.1f} {:.1f} {:.1f}\n'.format(
                            index, dets[k, -1], dets[k, 0] + 1,
                            dets[k, 1] + 1, dets[k, 2] + 1,
                            dets[k, 3] + 1))

def _do_python_eval(self, output_dir='output'):
    annopath = os.path.join(self._devkit_path, 'VOC' + self._year,
                            'Annotations', '{:s}.xml')
    # annopath 保存输入图像数据集的ground truth label的文件路径
    imagesetfile = os.path.join(self._devkit_path, 'VOC' + self._year,
                                'ImageSets', 'Main',
                                self._image_set + '.txt')
    # imagesetfile 保存了输入图像数据集的所有文件名（不包含后缀名）
    cachedir = os.path.join(self._devkit_path, 'annotations_cache')
    aps = []
    # The PASCAL VOC metric changed in 2010
    use_07_metric = True if int(self._year) < 2010 else False
    print('VOC07 metric? ' + ('Yes' if use_07_metric else 'No'))
    if not os.path.isdir(output_dir):
        os.mkdir(output_dir)
    for i, cls in enumerate(self._classes):
        if cls == '__background__':
            continue
        filename = self._get_voc_results_file_template().format(cls)
        
        #取出对于某个特定的前景类别下，当前模型预测结果txt文件，txt文件中包含了
        #对于当前的前景类别，输入图像数据集中每张图像所预测的bounding boxes
        
        rec, prec, ap = voc_eval(
            filename,
            annopath,
            imagesetfile,
            cls,
            cachedir,
            ovthresh=0.5,
            use_07_metric=use_07_metric,
            use_diff=self.config['use_diff'])
        
        #计算出当前前景类别下的AP
        
        aps += [ap]
        print(('AP for {} = {:.4f}'.format(cls, ap)))
        with open(os.path.join(output_dir, cls + '_pr.pkl'), 'wb') as f:
            pickle.dump({'rec': rec, 'prec': prec, 'ap': ap}, f)
    print(('Mean AP = {:.4f}'.format(np.mean(aps))))
    print('~~~~~~~~')
    print('Results:')
    for ap in aps:
        print(('{:.3f}'.format(ap)))
    print(('{:.3f}'.format(np.mean(aps))))
    print('~~~~~~~~')
    print('')
    print('--------------------------------------------------------------')
    print('Results computed with the **unofficial** Python eval code.')
    print('Results should be very close to the official MATLAB eval code.')
    print('Recompute with `./tools/reval.py --matlab ...` for your paper.')
    print('-- Thanks, The Management')
    print('--------------------------------------------------------------')

为了计算模型在输入数据集上的mAP，需要在每个前景类别上分别计算AP，再进行多个类别AP值的平均，得到mAP。故而会首先调用_write_voc_results_file函数将对于每个类别每张图像所检测到的prediction bounding boxes分别存放到classname.txt文件中。再调用_do_python_eval计算出每个类别的AP。下面重点看_do_python_eval中计算每个类别的rec, prec, ap的过程。

def voc_eval(detpath,
             annopath,
             imagesetfile,
             classname,
             cachedir,
             ovthresh=0.5,
             use_07_metric=False,
             use_diff=False):
    """rec, prec, ap = voc_eval(detpath,
                              annopath,
                              imagesetfile,
                              classname,
                              [ovthresh],
                              [use_07_metric])

  Top level function that does the PASCAL VOC evaluation.

  detpath: Path to detections
      detpath.format(classname) should produce the detection results file.
      包含对于当前前景类别，输入图像数据集的所有检测检测结果（txt文件）
      就是说，检测结果文件夹下面的txt文件必须是按照类别划分的检测结果，比如一共有7个类别，
      则会有7个txt文件，每个文件中的每一行表示一个该类别的prediction bounding boxes
      每一行的格式为：（txt文件名中应包含classname）
      然后现在的detpath仅仅是某一个前景类别的检测结果
      对于当前的前景类别，输入图像数据集中每张图像所预测的bounding boxes
      imagename confidence  xmin ymin xmax ymax
      imagename ……

  annopath: Path to annotations
      annopath.format(imagename) should be the xml annotations file.
      所有图像的ground truth标注文件根目录（xml文件名中应包含imagename）
      文件夹下对于测试数据集中的每张图像都有一个对应的标签文件（xml文件与之对应）
      classname xmin ymin xmax ymax

  imagesetfile: Text file containing the list of images, one image per line.
      包含所有输入图像文件名的txt文件
  classname: Category name (duh)当前的前景类别
  cachedir: Directory for caching the annotations   包含所有ground truth boxes标签的缓存文件
  [ovthresh]: Overlap threshold (default = 0.5)   设定overlap阈值，大于0.5认为检测正确
  [use_07_metric]: Whether to use VOC07's 11 point AP computation
      (default False)  不使用
  """
    # assumes detections are in detpath.format(classname)
    # assumes annotations are in annopath.format(imagename)
    # assumes imagesetfile is a text file with each line an image name
    # cachedir caches the annotations in a pickle file

    # first load gt
    if not os.path.isdir(cachedir):
        os.mkdir(cachedir)
    cachefile = os.path.join(cachedir, '%s_annots.pkl' % imagesetfile)
    # read list of images   读取测试图像文件名
    with open(imagesetfile, 'r') as f:
        lines = f.readlines()##读取所有待检测图片名
    imagenames = [x.strip() for x in lines]#待检测图像文件名字存于列表imagenames

    if not os.path.isfile(cachefile):
        #如果只读文件不存在，则只好从原始数据集中重新加载数据
        # load annotations
        recs = {}
        '''
        recs 是一个字典，字典中的每个key表示测试数据集中每张图像的文件名
        对应的键值是读取该测试图像所对应的annotation文件（gt_boxes）标签文件
        得到的一个列表，列表的长度为数据集中总图像数，列表中的每个元素是一个列表
        这个小列表中的每个元素是一个字典，对应于当前输入图像中所检测出来的所有包围框
        这里读出来的是ground truth boxes的信息，表示当前输入图像的ground truth boxes
        的类别和坐标位置分别是什么，故而recs为长度为数据集中图像总数的列表，列表中的每个元素
        是是一个列表，这个列表的长度是当前输入图像总ground truth boxes的个数，类别任意
        '''
        for i, imagename in enumerate(imagenames):
            recs[imagename] = parse_rec(annopath.format(imagename))
            # if i % 100 == 0:
            #     print('Reading annotation for {:d}/{:d}'.format(
            #         i + 1, len(imagenames)))
        # save
        print('Saving cached annotations to {:s}'.format(cachefile))
        with open(cachefile, 'wb') as f:# #recs字典c保存到只读文件
            pickle.dump(recs, f)
            #将数据集中每张图像的ground truth boxes存储到磁盘上
    else:
        # load
        with open(cachefile, 'rb') as f:
            try:
                recs = pickle.load(f)##如果已经有了只读文件，加载到recs
            except:
                recs = pickle.load(f, encoding='bytes')
    '''读取包含ground truth boxes的文件'''

    class_recs = {}
    npos = 0
    '''
    # extract gt objects for this class 
    #按类别获取标注文件，recall和precision都是针对不同类别而言的，AP也是对各个类别分别算的。
    class_recs = {} #当前类别的标注    
    '''
    for imagename in imagenames:#遍历输入图像数据集中的每张图像
        R = [obj for obj in recs[imagename] if obj['name'] == classname]
        '''
        recs[imagename]  为当前输入图像所有类别的ground truth boxes信息
        现在只要取出当前待检测的类别
        当前待检测图像中所有类别为classname的ground truth bounding boxes构成的列表  R
        列表中的每个元素是一个字典结构，表示当前图像中类别为classname的gt_boxes 的信息      
        '''
        bbox = np.array([x['bbox'] for x in R])

        #将当前图像中属于当前前景类别的所有ground truth boxes的位置坐标信息取出，构成numpy.ndarray
        #bbox  shape [当前输入图像中属于当前类别的所有ground truth boxes数，4]

        if use_diff:
            difficult = np.array([False for x in R]).astype(np.bool)

            # 先假设所有的样本都是不困难的，也就是说困难样本也要参与到性能指标的计算

        else:
            difficult = np.array([x['difficult'] for x in R]).astype(np.bool)

        det = [False] * len(R)##len(R)就是当前类别的gt目标个数，det表示是否检测到，初始化为false。
        npos = npos + sum(~difficult)#自增，非difficult样本数量，如果数据集没有difficult，npos数量就是gt数量。
        class_recs[imagename] = {
            'bbox': bbox,
            'difficult': difficult,
            'det': det
        }
    '''
    这个过程将获得，在当前类别下classname*******意：当前类别下所有验证图像的ground truth boxes是class_recs字典
    (字典中的每个元素表示输入图像集合中每张图像在当前类别下的ground truth信息)

    验证数据集中每张图像的ground truth，对应于字典中的一个key，key名称为图像名
    验证图像的每张图像在当前前景类别下将会初始化成一个字典，字典包含如下键值：
    
    'bbox'：当前输入图像中属于当前前景类别的所有ground truth boxes坐标信息   shape [num_gt_boxes,4]
    'difficult':Fasle shape [num_gt_boxes,]
    'det':Fasle shape [num_gt_boxes,]  表示当前图像中的当前ground truth boxes是否被检测到了
    
    对应的value即每张图像属于当前类别的ground truth boxes个数，type   numpy.ndarray   
    故而class_recs保存了当前类别下所有验证图像的ground truth boxes信息  
    '''

    # read dets
    detfile = detpath.format(classname)

    #读取在当前类别下的所有输入图像的prediction boxes就是预测出来的包围框   classname.txt文件
    #txt文件的每一行
    # imagename confidence bbox_xmin  bbox_ymin bbox_xmax bbox_ymax

    with open(detfile, 'r') as f:
        lines = f.readlines()
        #读取txt文件中的每一行，表示某张输入图像中属于当前前景类别的bounding boxes置信度和坐标

    splitlines = [x.strip().split(' ') for x in lines]
    image_ids = [x[0] for x in splitlines]
    confidence = np.array([float(x[1]) for x in splitlines])
    BB = np.array([[float(z) for z in x[2:]] for x in splitlines])
    '''
    这几行代码是为了读取当前类别下网络模型所预测出的所有bounding boxes（当然是在当前类别下，分布在每张验证图像上）
    
    相当于是，对于当前的前景类别，整个数据集一共预测出来了多少个属于当前前景类别的bounding boxes
    
    到目前为止，找到了
    
    class_rec={}   当前类别下所有输入图像的ground truth boxes
    字典中的每个键值表示输入图像集合中某张输入图像在当前类别下所对应的ground truth information
    每张图像所对应的当前类别下的ground truth information是一个字典
    'bbox': bbox,shape [num_gt_boxes,4]
    'difficult': difficult, Fasle shape [num_gt_boxes,]
    'det': det  Fasle shape [num_gt_boxes,]
    
    以及当前前景类别下，所有输入图像所以测出来的prediction bounding boxes以及其置信度
    image_ids shape [num_all_pred,]  所预测出来的bounding boxes所对应的输入图像编号
    BB  shape [num_all_pred,4]       所预测出来的每个bounding boxes坐标位置信息
    confidence   shape [num_all_pred,]    所预测出来的每个bounding boxes置信度
    '''

    nd = len(image_ids)

    #表示的是在当前类别下输入图像数据集的prediction bounding boxes总数
    #注意这里表示的并不是验证数据集中的图像总数，因为在当前类别下，一张验证图像中可能同时包含多个prediction boxes

    tp = np.zeros(nd)#true positive，表示当前在某张输入图像上预测出来的bounding boxes是不是true positive
    fp = np.zeros(nd)#false positive，表示当前在某张输入图像上预测出来的bounding boxes是不是false positive

    #只要输出现在BB中的prediction bounding boxes，必然是TP或者FP

    if BB.shape[0] > 0:#如果模型对于当前输入数据集所有图像所预测出来的当前类别的bounding boxes数量大于0
        # sort by confidence
        sorted_ind = np.argsort(-confidence)
        sorted_scores = np.sort(-confidence)
        BB = BB[sorted_ind, :]
        image_ids = [image_ids[x] for x in sorted_ind]
        '''
        #对于当前类别下所有测试图像的所有prediction bounding boxes的confidence
        进行升序排列（从小到大），得到预测出的classname.txt中对应的测试图像文件名排序   
        
        实际上是对于验证数据集中的所有图像在当前类别下的prediction bounding boxes的置信度进行降序排序  
        置信度大的在前面，置信度小的在后面   
        '''

        # go down dets and mark TPs and FPs
        for d in range(nd):

            #对于当前类别下，网络模型所预测出来的每个bounding boxes
            #对于验证数据集中所有图像的prediction bounding boxes
            #判定prediction boxes是true positive还是false positive
            #注意这里是遍历在当前前景类别下，每张输入图像上的每个prediction bounding boxes

            R = class_recs[image_ids[d]]

            #R表示当前输入图像上的所有属于当前类别的ground truth boxes

            bb = BB[d, :].astype(float)

            #取出当前验证图像上detection 检测出的第d个bounding boxes
            #bb shape [4,]

            ovmax = -np.inf
            BBGT = R['bbox'].astype(float)

            # 取出当前图像中所有类别为classname的ground truth bounding boxes

            # shape [#gt_boxes,4]
            if BBGT.size > 0:
                # compute overlaps
                # intersection
                '''
               对于在验证数据集上所预测出来的每个prediction bounding boxes
               需要判定它是TP还是FP
               首先找到它所在的验证图像中所有属于当前类别下的ground truth boxes
               计算所预测出来的prediction bounding boxes与所有gt boxes之间的IOU  
               
               numpy.maximun 
               np.maximum：(X, Y, out=None)    X 与 Y 逐位比较取其大者；最少接收两个参数              
               '''
                ixmin = np.maximum(BBGT[:, 0], bb[0])
                iymin = np.maximum(BBGT[:, 1], bb[1])
                ixmax = np.minimum(BBGT[:, 2], bb[2])
                iymax = np.minimum(BBGT[:, 3], bb[3])
                iw = np.maximum(ixmax - ixmin + 1., 0.)
                ih = np.maximum(iymax - iymin + 1., 0.)
                inters = iw * ih

                # union
                uni = ((bb[2] - bb[0] + 1.) * (bb[3] - bb[1] + 1.) +
                       (BBGT[:, 2] - BBGT[:, 0] + 1.) *
                       (BBGT[:, 3] - BBGT[:, 1] + 1.) - inters)

                overlaps = inters / uni
                ovmax = np.max(overlaps)
                jmax = np.argmax(overlaps)

                #计算当前类别下，当前图像中的每个ground truth boxes与当前的prediction boxes之间的IOU，取出最大的IOU

            if ovmax > ovthresh:#如果最大的overlap大于0.5，则为正样本，并标记出当前的ground truth boxes已经被检测到
                if not R['difficult'][jmax]:
                    if not R['det'][jmax]:
                        #如果当前的ground truth boxes还没有与别的prediction boxes匹配
                        #就是说每个ground truth boxes只能与一个prediction boxes对应，不能与多个prediction boxes对应
                        tp[d] = 1.
                        R['det'][jmax] = 1
                    else:
                        fp[d] = 1.
            else:
                fp[d] = 1.
                # 如果当前的prediction bounding boxes和所有没有被检测（没有被与prediction bounding boxes匹配之后）的ground truth boxes
                # 之间最大的IOU小于设定阈值或者当前图像中属于前景类别的ground truth boxes数量为0
                # 则将当前的prediction bounding boxes记作FP

    # compute precision recall
    fp = np.cumsum(fp)
    tp = np.cumsum(tp)
    rec = tp / float(npos)   #recall = TP/(TP+FN)=TP/(all gt)
    # avoid divide by zero in case the first detection matches a difficult
    # ground truth
    prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)

    #precision=TP/(TP+FP)=TP/(all prediction)

    ap = voc_ap(rec, prec, use_07_metric)

    # rec numpy.ndarray [num_all_pred_boxes,]
    # prec numpy.ndarray [num_all_pred_boxes,]
    # num_all_pred_boxes表示当前类别下，所有输入图像所预测的prediction bounding boxes

    return rec, prec, ap

def voc_ap(rec, prec, use_07_metric=False):
    '''
    :param rec: numpy.ndarray [#num_all_pred_boxes,]
    :param prec: numpy.ndarray [#num_all_pred_boxes,]
    :param use_07_metric:
    :return:
    '''
    """ ap = voc_ap(rec, prec, [use_07_metric])
  Compute VOC AP given precision and recall.
  If use_07_metric is true, uses the
  VOC 07 11 point method (default:False).
  """
    if use_07_metric:
        # 根据prec和rec可以画出PR曲线，横轴是recall，纵轴是precision
        # 11 point metric
        ap = 0.
        for t in np.arange(0., 1.1, 0.1):
            if np.sum(rec >= t) == 0:#说明TP=0
                p = 0
            else:
                p = np.max(prec[rec >= t])
            ap = ap + p / 11.
    else:
        # correct AP calculation
        # first append sentinel values at the end
        mrec = np.concatenate(([0.], rec, [1.]))
        mpre = np.concatenate(([0.], prec, [0.]))

        # compute the precision envelope
        for i in range(mpre.size - 1, 0, -1):
            mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])

        # to calculate area under PR curve, look for points
        # where X axis (recall) changes value
        i = np.where(mrec[1:] != mrec[:-1])[0]

        # and sum (\Delta recall) * prec
        ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
    return ap

现在假设只针对一个特定的前景类别，有了对于所有输入图像在当前类别下的prediction boxes和ground truth boxes，怎么计算出当前类别的AP值呢？

AP值被定义为PR曲线下的积分面积，areas under curve。值域范围0-1。

P指的是precision=TP/(TP+FP)=TP/num_pred_boxes，表示prediction bounding boxes的准确率

R指的是recall=TP/(TP+FN)=TP/num_gt_boxes,表示预测的bounding boxes的查全率（将所有gt boxes召回了多少）

怎么样定义一个prediction bounding boxes是TP还是FP呢？这时候要引入IOU的概念，首先要设定IOU阈值（通常是0.5），如果prediction bounding boxes与ground truth boxes之间的IOU大于0.5，则是TP，否则是FP。

AP的具体计算步骤如下：
（1）首先将当前类别下所有图像的所有prediction bounding boxes按照confidence score从高到低进行排序（这时候不需要区分所预测的包围框是否属于同一张输入图像）

（2）对于每个prediction bounding boxes，找到它所在的输入图像中关于当前类别的所有ground truth boxes，计算pred框和所有gt 框之间的IOU，并取出最大的IOU值以及所对应的gt boxes，如果gt boxes还没有别的prediction boxes与之匹配（'det'=False）并且最大的IOU值大于threshold=0.5，则当前的prediction boxes的TP=1，否则FP=1

（3）假设对于所有输入图像的prediction bounding boxes总数为N，则最终将得到shape=[N,]的 numpy数组：precision和recall。表示当前的prediction bounding boxes是TP还是FP

（4）根据prec和rec的numpy数组，画出PR曲线上面的散点图。

这一步骤非常关键，举例说明，假设all_gt=5，即输入图像数据集*有5个ground truth boxes属于当前的前景类别(并不需要区分是哪张输入图像上面的)，根据置信度分数排序后的包围框(all_pred_boxes=10)计算得到如下。

则：TP=numpy.ndarray([1,1,0,0,0,1,1,0,0,1]

FP=numpy.ndarray([0,0,1,1,1,0,0,1,1,0])

根据TP数组和FP数组求出PR曲线上的散点对(recall,prec)。

TP=np.cumsum(TP)=numpy.ndarray([1,2,2,2,2,3,4,4,4,5])

FP=np.cumsum(FP)=numpy.ndarray([0,0,1,1,1,1,1,2,3,3])

recall=TP/(TP+FP)，但注意，无论是recall点的计算还是precision点的计算都是动态计算的，取决于对当前的prediction bounding boxes判决为TP或者是FP之后，计算出来的recall值和precision值，它并不知道置信度在当前预测框之后的框是TP还是FP。

比如对于rank2的样本而言，TP累计值为2，recall=TP/(TP+FP)，而总ground truth boxes=5，故而有3个gt boxes没有预测到，即FP=3，故而recall=2/(2+3)=0.4。FN=0，故而precision=1

当当前样本为TP时，recall必然增加（分母不变，分子增加），而precision可能增加（当FP!=0时）或者不变（当FP=0时）

当当前样本为FP时，precision必然减少（因为FP增加，precision分母增加，分子不变），recall不变

故而以recall为横轴，precision为纵轴，则只能是如下的曲线段：因而会导致PR曲线产生Zig Zag的效果，所以如何求PR曲线下的面积非常重要。
object detection中的mAP指标

object detection中的mAP指标

（5）画出动态的PR曲线，求PR曲线下的积分

tp=np.array([1,1,0,0,0,1,1,0,0,1])
fp=np.array([0,0,1,1,1,0,0,1,1,0])
fp = np.cumsum(fp)
tp = np.cumsum(tp)

rec = tp / float(5)   #recall = TP/(TP+FN)=TP/(all gt)
# avoid divide by zero in case the first detection matches a difficult
# ground truth
prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)

#画出PR曲线

import pylab as pl
import matplotlib.pyplot as plt
pl.plot(rec, prec, lw=2,label='Precision-recall curve')

pl.xlabel('Recall')
pl.ylabel('Precision')
plt.grid(True)
pl.ylim([0.0, 1.05])
pl.xlim([0.0, 1.0])
pl.title('Precision-Recall')
pl.legend(loc="upper right")
plt.show()

object detection中的mAP指标

常用的求PR曲线下的积分面积有两种方法：

1.11点（VOC 2007）

将横轴recall分成10个等分，求出range(0,1,0.1)范围内11个整数点处的precision值，并将它们相加，最后除以11.

其中某个区间段的（长度为0.1的区间段）precision值等于它之后的precision值的最大值

for t in np.arange(0., 1.1, 0.1):
    if np.sum(rec >= t) == 0:#说明TP=0
        p = 0
    else:
        p = np.max(prec[rec >= t])
    ap = ap + p / 11.

object detection中的mAP指标