选择性搜索（selective search）

程序员文章站 2022-07-05 10:18:06

...

https://blog.csdn.net/guoyunfei20/article/details/78723646

https://blog.csdn.net/Tomxiaodai/article/details/81412354

https://blog.csdn.net/yuanlulu/article/details/82157071

该文翻译整理自：selective search for object detection（c++ / python）

一、目标检测 VS 目标识别

目标识别（objec recognition）是指明一幅输入图像中包含那类目标。其输入为一幅图像，输出是该图像中的目标属于哪个类别（class probability）。而目标检测（object detection）除了要告诉输入图像中包含了哪类目前外，还要框出该目标的具体位置（bounding boxes）。

在目标检测时，为了定位到目标的具体位置，通常会把图像分成许多子块（sub-regions / patches），然后把子块作为输入，送到目标识别的模型中。分子块的最直接方法叫滑动窗口法（sliding window approach）。滑动窗口的方法就是按照子块的大小在整幅图像上穷举所有子图像块。这种方法产生的数据量想想都头大。和滑动窗口法相对的是另外一类基于区域（region proposal）的方法。selective search就是其中之一！

二、selective search算法流程

选择性搜索（selective search）

step0：生成区域集R，具体参见论文《Efficient Graph-Based Image Segmentation》

step1：计算区域集R里每个相邻区域的相似度S={s1,s2,…}
step2：找出相似度最高的两个区域，将其合并为新集，添加进R
step3：从S中移除所有与step2中有关的子集
step4：计算新集与所有子集的相似度
step5：跳至step2，直至S为空

三、相似度计算

论文考虑了颜色、纹理、尺寸和空间交叠这4个参数。

3.1、颜色相似度（color similarity）
将色彩空间转为HSV，每个通道下以bins=25计算直方图，这样每个区域的颜色直方图有25*3=75个区间。对直方图除以区域尺寸做归一化后使用下式计算相似度：

选择性搜索（selective search）

3.2、纹理相似度（texture similarity）

论文采用方差为1的高斯分布在8个方向做梯度统计，然后将统计结果（尺寸与区域大小一致）以bins=10计算直方图。直方图区间数为8*3*10=240（使用RGB色彩空间）。

选择性搜索（selective search）

其中，选择性搜索（selective search）是直方图中第个bin的值。

3.3、尺寸相似度（size similarity）

选择性搜索（selective search）

保证合并操作的尺度较为均匀，避免一个大区域陆续“吃掉”其他小区域。

例：设有区域a-b-c-d-e-f-g-h。较好的合并方式是：ab-cd-ef-gh -> abcd-efgh -> abcdefgh。不好的合并方法是：ab-c-d-e-f-g-h ->abcd-e-f-g-h ->abcdef-gh -> abcdefgh。

3.4、交叠相似度（shape compatibility measure）

选择性搜索（selective search）

3.5、最终的相似度

选择性搜索（selective search）

四、OpenCV 3.3 实现了selective search

在OpenCV的contrib模块中实现了selective search算法。类定义为：

cv::ximgproc::segmentation::SelectiveSearchSegmentation

举例：




#include "opencv2/ximgproc/segmentation.hpp"



#include "opencv2/highgui.hpp"



#include "opencv2/core.hpp"



#include "opencv2/imgproc.hpp"



#include <iostream>



#include <ctime>



 




using namespace cv;




using namespace cv::ximgproc::segmentation;



 




static void help() {



    std::cout << std::endl <<



    "Usage:" << std::endl <<



    "./ssearch input_image (f|q)" << std::endl <<



    "f=fast, q=quality" << std::endl <<



    "Use l to display less rects, m to display more rects, q to quit" << std::endl;



}



 



 




int main(int argc, char** argv) {



    // If image path and f/q is not passed as command




    // line arguments, quit and display help message




    if (argc < 3) {



        help();



        return -1;



    }



 



    // speed-up using multithreads




    // void cv::setUseOptimized(bool onoff), Enables or disables the optimized code.




    setUseOptimized(true);



    setNumThreads(4);



 



    // read image




    Mat im = imread(argv[1]);



    // resize image




    int newHeight = 200;



    int newWidth = im.cols*newHeight/im.rows;



    resize(im, im, Size(newWidth, newHeight));



 



    // create Selective Search Segmentation Object using default parameters




    Ptr<SelectiveSearchSegmentation> ss = createSelectiveSearchSegmentation();



    // set input image on which we will run segmentation




    ss->setBaseImage(im);



 



    // Switch to fast but low recall Selective Search method




    if (argv[2][0] == 'f') {



        ss->switchToSelectiveSearchFast();



    }



    // Switch to high recall but slow Selective Search method




    else if (argv[2][0] == 'q') {



        ss->switchToSelectiveSearchQuality();



    } 



    // if argument is neither f nor q print help message




    else {



        help();



        return -2;



    }



 



    // run selective search segmentation on input image




    std::vector<Rect> rects;



    ss->process(rects);



    std::cout << "Total Number of Region Proposals: " << rects.size() << std::endl;



 



    // number of region proposals to show




    int numShowRects = 100;



    // increment to increase/decrease total number of reason proposals to be shown




    int increment = 50;



 



    while(1) {



        // create a copy of original image




        Mat imOut = im.clone();



 



        // itereate over all the region proposals




        for(int i = 0; i < rects.size(); i++) {



            if (i < numShowRects) {



                rectangle(imOut, rects[i], Scalar(0, 255, 0));



            }



            else {



                break;



            }



        }



 



        // show output




        imshow("Output", imOut);



 



        // record key press




        int k = waitKey();



 



        // m is pressed




        if (k == 109) {



            // increase total number of rectangles to show by increment




            numShowRects += increment;



        }



        // l is pressed




        else if (k == 108 && numShowRects > increment) {



            // decrease total number of rectangles to show by increment




            numShowRects -= increment;



        }



        // q is pressed




        else if (k == 113) {



            break;



        }



    }



    return 0;



}

上边代码git地址：https://code.csdn.net/guoyunfei20/selective_search_opencv_demo.git（运行需要安装OpenCV3.0以上 + contrib）

1 前言

2 Selective Search算法

3 Python源码分析

1 前言

在目标检测时，为了定位到目标的具体位置，通常会把图像分成许多子块（sub-regions / patches），然后把子块作为输入，送到目标识别的模型中。selective search就是一种选择子块的启发式方法。

选择性搜索（selective search）

2 Selective Search算法

主要思路：输入一张图片，首先通过图像分割的方法（如大名鼎鼎的felzenszwalb算法）获得很多小的区域，然后对这些小的区域不断进行合并，一直到无法合并为止。此时这些原始的小区域和合并得到的区域的就是我们得到的bounding box.

选择性搜索（selective search）

算法分为如下几个大步：

1. 生成原始的区域集R（利用felzenszwalb算法）

2. 计算区域集R里每个相邻区域的相似度S={s1,s2,…}

3. 找出相似度最高的两个区域，将其合并为新集，添加进R

4. 从S中移除所有与第3步中有关的子集

5. 计算新集与所有子集的相似度

6.跳至第三步，不断循环，合并，直至S为空（到不能再合并时为止）

3 Python源码分析

Github上有一个选择性搜索的简单实现 -- selectivesearch ，可以帮助大家理解





import skimage.data





import selectivesearch



 



img = skimage.data.astronaut()



img_lbl, regions = selectivesearch.selective_search(img, scale=500, sigma=0.9, min_size=10)



regions[:10]



=>



[{'labels': [0.0], 'rect': (0, 0, 15, 24), 'size': 260},



 {'labels': [1.0], 'rect': (13, 0, 1, 12), 'size': 23},



 {'labels': [2.0], 'rect': (0, 15, 15, 11), 'size': 30},



 {'labels': [3.0], 'rect': (15, 14, 0, 0), 'size': 1},



 {'labels': [4.0], 'rect': (0, 0, 61, 153), 'size': 4927},



 {'labels': [5.0], 'rect': (0, 12, 61, 142), 'size': 177},



 {'labels': [6.0], 'rect': (7, 54, 6, 17), 'size': 8},



 {'labels': [7.0], 'rect': (28, 50, 18, 32), 'size': 22},



 {'labels': [8.0], 'rect': (2, 99, 7, 24), 'size': 24},



 {'labels': [9.0], 'rect': (14, 118, 79, 117), 'size': 4008}]

选择性搜索（selective search）

1. 用户生成原始区域集的函数，其中用到了felzenszwalb图像分割算法。每一个区域都有一个编号，将编号并入图片中，方便后面的操作




def _generate_segments(im_orig, scale, sigma, min_size):



    """




        segment smallest regions by the algorithm of Felzenswalb and



        Huttenlocher



    """



 



    # open the Image




    im_mask = skimage.segmentation.felzenszwalb(



        skimage.util.img_as_float(im_orig), scale=scale, sigma=sigma,



        min_size=min_size)



 



    # merge mask channel to the image as a 4th channel




    im_orig = numpy.append(



        im_orig, numpy.zeros(im_orig.shape[:2])[:, :, numpy.newaxis], axis=2)



    im_orig[:, :, 3] = im_mask



 



    return im_orig

2. 计算两个区域的相似度
论文中考虑了四种相似度 -- 颜色，纹理，尺寸，以及交叠。

其中颜色和纹理相似度，通过获取两个区域的直方图的交集，来判断相似度。

最后的相似度是四种相似度的加和。




def _sim_colour(r1, r2):



    """




        calculate the sum of histogram intersection of colour



    """



    return sum([min(a, b) for a, b in zip(r1["hist_c"], r2["hist_c"])])



 



 



def _sim_texture(r1, r2):



    """




        calculate the sum of histogram intersection of texture



    """



    return sum([min(a, b) for a, b in zip(r1["hist_t"], r2["hist_t"])])



 



 



def _sim_size(r1, r2, imsize):



    """




        calculate the size similarity over the image



    """



    return 1.0 - (r1["size"] + r2["size"]) / imsize



 



 



def _sim_fill(r1, r2, imsize):



    """




        calculate the fill similarity over the image



    """



    bbsize = (



        (max(r1["max_x"], r2["max_x"]) - min(r1["min_x"], r2["min_x"]))



        * (max(r1["max_y"], r2["max_y"]) - min(r1["min_y"], r2["min_y"]))



    )



    return 1.0 - (bbsize - r1["size"] - r2["size"]) / imsize



 



 



def _calc_sim(r1, r2, imsize):



    return (_sim_colour(r1, r2) + _sim_texture(r1, r2)



            + _sim_size(r1, r2, imsize) + _sim_fill(r1, r2, imsize))

3. 用于计算颜色和纹理的直方图的函数

颜色直方图：将色彩空间转为HSV，每个通道下以bins=25计算直方图，这样每个区域的颜色直方图有25*3=75个区间。对直方图除以区域尺寸做归一化后使用下式计算相似度：

选择性搜索（selective search）

纹理相似度：论文采用方差为1的高斯分布在8个方向做梯度统计，然后将统计结果（尺寸与区域大小一致）以bins=10计算直方图。直方图区间数为8*3*10=240（使用RGB色彩空间）。这里是用了LBP（local binary pattern）获取纹理特征，建立直方图，其余相同

选择性搜索（selective search）

其中，选择性搜索（selective search）是直方图中第个bin的值。




def _calc_colour_hist(img):



    """




        calculate colour histogram for each region



        the size of output histogram will be BINS * COLOUR_CHANNELS(3)



        number of bins is 25 as same as [uijlings_ijcv2013_draft.pdf]



        extract HSV



    """



 



    BINS = 25




    hist = numpy.array([])



 



    for colour_channel in (0, 1, 2):



 



        # extracting one colour channel




        c = img[:, colour_channel]



 



        # calculate histogram for each colour and join to the result




        hist = numpy.concatenate(



            [hist] + [numpy.histogram(c, BINS, (0.0, 255.0))[0]])



 



    # L1 normalize




    hist = hist / len(img)



 



    return hist



 



 



def _calc_texture_gradient(img):



    """




        calculate texture gradient for entire image



        The original SelectiveSearch algorithm proposed Gaussian derivative



        for 8 orientations, but we use LBP instead.



        output will be [height(*)][width(*)]



    """



    ret = numpy.zeros((img.shape[0], img.shape[1], img.shape[2]))



 



    for colour_channel in (0, 1, 2):



        ret[:, :, colour_channel] = skimage.feature.local_binary_pattern(



            img[:, :, colour_channel], 8, 1.0)



 



    return ret



 



 



def _calc_texture_hist(img):



    """




        calculate texture histogram for each region



        calculate the histogram of gradient for each colours



        the size of output histogram will be



            BINS * ORIENTATIONS * COLOUR_CHANNELS(3)



    """



    BINS = 10




 



    hist = numpy.array([])



 



    for colour_channel in (0, 1, 2):



 



        # mask by the colour channel




        fd = img[:, colour_channel]



 



        # calculate histogram for each orientation and concatenate them all




        # and join to the result




        hist = numpy.concatenate(



            [hist] + [numpy.histogram(fd, BINS, (0.0, 1.0))[0]])



 



    # L1 Normalize




    hist = hist / len(img)



 



    return hist

4. 提取区域的尺寸，颜色和纹理特征




def _extract_regions(img):



 



    R = {}



 



    # get hsv image




    hsv = skimage.color.rgb2hsv(img[:, :, :3])



 



    # pass 1: count pixel positions




    for y, i in enumerate(img):



 



        for x, (r, g, b, l) in enumerate(i):



 



            # initialize a new region




            if l not in R:



                R[l] = {



                    "min_x": 0xffff, "min_y": 0xffff,



                    "max_x": 0, "max_y": 0, "labels": [l]}



 



            # bounding box




            if R[l]["min_x"] > x:



                R[l]["min_x"] = x



            if R[l]["min_y"] > y:



                R[l]["min_y"] = y



            if R[l]["max_x"] < x:



                R[l]["max_x"] = x



            if R[l]["max_y"] < y:



                R[l]["max_y"] = y



 



    # pass 2: calculate texture gradient




    tex_grad = _calc_texture_gradient(img)



 



    # pass 3: calculate colour histogram of each region




    for k, v in list(R.items()):



 



        # colour histogram




        masked_pixels = hsv[:, :, :][img[:, :, 3] == k]



        R[k]["size"] = len(masked_pixels / 4)



        R[k]["hist_c"] = _calc_colour_hist(masked_pixels)



 



        # texture histogram




        R[k]["hist_t"] = _calc_texture_hist(tex_grad[:, :][img[:, :, 3] == k])



 



    return R

5. 找邻居 -- 通过计算每个区域与其余的所有区域是否有相交，来判断是不是邻居




def _extract_neighbours(regions):



 



    def intersect(a, b):




        if (a["min_x"] < b["min_x"] < a["max_x"]



                and a["min_y"] < b["min_y"] < a["max_y"]) or (



            a["min_x"] < b["max_x"] < a["max_x"]



                and a["min_y"] < b["max_y"] < a["max_y"]) or (



            a["min_x"] < b["min_x"] < a["max_x"]



                and a["min_y"] < b["max_y"] < a["max_y"]) or (



            a["min_x"] < b["max_x"] < a["max_x"]



                and a["min_y"] < b["min_y"] < a["max_y"]):



            return True




        return False




 



    R = list(regions.items())



    neighbours = []



    for cur, a in enumerate(R[:-1]):



        for b in R[cur + 1:]:



            if intersect(a[1], b[1]):



                neighbours.append((a, b))



 



    return neighbours

6. 合并两个区域的函数




def _merge_regions(r1, r2):



    new_size = r1["size"] + r2["size"]



    rt = {



        "min_x": min(r1["min_x"], r2["min_x"]),



        "min_y": min(r1["min_y"], r2["min_y"]),



        "max_x": max(r1["max_x"], r2["max_x"]),



        "max_y": max(r1["max_y"], r2["max_y"]),



        "size": new_size,



        "hist_c": (



            r1["hist_c"] * r1["size"] + r2["hist_c"] * r2["size"]) / new_size,



        "hist_t": (



            r1["hist_t"] * r1["size"] + r2["hist_t"] * r2["size"]) / new_size,



        "labels": r1["labels"] + r2["labels"]



    }



    return rt

7. 主函数 -- Selective Search

scale：图像分割的集群程度。值越大，意味集群程度越高，分割的越少，获得子区域越大。默认为1

signa: 图像分割前，会先对原图像进行高斯滤波去噪，sigma即为高斯核的大小。默认为0.8

min_size : 最小的区域像素点个数。当小于此值时，图像分割的计算就停止，默认为20

每次选出相似度最高的一组区域（如编号为100和120的区域），进行合并，得到新的区域（如编号为300）。然后计算新的区域300与区域100的所有邻居和区域120的所有邻居的相似度，加入区域集S。不断循环，知道S为空，此时最后只剩下一个区域，而且它的像素数会非常大，接近原始图片的像素数，因此无法继续合并。最后退出程序。




def selective_search(




        im_orig, scale=1.0, sigma=0.8, min_size=50):



    '''Selective Search




    Parameters



    ----------



        im_orig : ndarray



            Input image



        scale : int



            Free parameter. Higher means larger clusters in felzenszwalb segmentation.



        sigma : float



            Width of Gaussian kernel for felzenszwalb segmentation.



        min_size : int



            Minimum component size for felzenszwalb segmentation.



    Returns



    -------



        img : ndarray



            image with region label



            region label is stored in the 4th value of each pixel [r,g,b,(region)]



        regions : array of dict



            [



                {



                    'rect': (left, top, width, height),



                    'labels': [...],



                    'size': component_size



                },



                ...



            ]



    '''



    assert im_orig.shape[2] == 3, "3ch image is expected"




 



    # load image and get smallest regions




    # region label is stored in the 4th value of each pixel [r,g,b,(region)]




    img = _generate_segments(im_orig, scale, sigma, min_size)



 



    if img is None:



        return None, {}



 



    imsize = img.shape[0] * img.shape[1]



    R = _extract_regions(img)



 



    # extract neighbouring information




    neighbours = _extract_neighbours(R)



 



    # calculate initial similarities




    S = {}



    for (ai, ar), (bi, br) in neighbours:



        S[(ai, bi)] = _calc_sim(ar, br, imsize)



 



    # hierarchal search




    while S != {}:



 



        # get highest similarity




        i, j = sorted(S.items(), key=lambda i: i[1])[-1][0]



 



        # merge corresponding regions




        t = max(R.keys()) + 1.0




        R[t] = _merge_regions(R[i], R[j])



 



        # mark similarities for regions to be removed




        key_to_delete = []



        for k, v in list(S.items()):



            if (i in k) or (j in k):



                key_to_delete.append(k)



 



        # remove old similarities of related regions




        for k in key_to_delete:



            del S[k]



 



        # calculate similarity set with the new region




        for k in [a for a in key_to_delete if a != (i, j)]:



            n = k[1] if k[0] in (i, j) else k[0]



            S[(t, n)] = _calc_sim(R[t], R[n], imsize)



 



    regions = []



    for k, r in list(R.items()):



        regions.append({



            'rect': (



                r['min_x'], r['min_y'],



                r['max_x'] - r['min_x'], r['max_y'] - r['min_y']),



            'size': r['size'],



            'labels': r['labels']



        })



 



    return img, regions

上一篇： js实现轮播图的两种方式(构造函数、面向对象)

下一篇：动漫十大逆天反派人物长得帅，实力强，人气碾压主角

选择性搜索（selective search）

1 前言

2 Selective Search算法

3 Python源码分析

荐 Java刷题笔记15：不同的二叉搜索树（ Unique Binary Search Trees）

使用 Python 中 re 模块对测试用例参数化，进行搜索 search、替换 sub

【leetcode】-700. Search in a Binary Search Tree 查找二叉搜索树

LeetCode 235--二叉搜索树的最近公共祖先 ( Lowest Common Ancestor of a Binary Search Tree ) ( C语言版 )

LeetCode 235. Lowest Common Ancestor of a Binary Search Tree 二叉搜索树

二叉搜索树 (binary search tree)

python使用正则表达式的search()函数实现指定位置搜索功能

利用Yahoo! Search API开发自已的搜索引擎-php版

Elastic Search搜索实例

Elasticsearch(五)----search搜索详解