欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

选择性搜索(selective search)

程序员文章站 2022-07-05 10:18:06
...

https://blog.csdn.net/guoyunfei20/article/details/78723646

https://blog.csdn.net/Tomxiaodai/article/details/81412354

https://blog.csdn.net/yuanlulu/article/details/82157071

该文翻译整理自:selective search for object detection(c++ / python)


一、目标检测 VS 目标识别

目标识别(objec recognition)是指明一幅输入图像中包含那类目标。其输入为一幅图像,输出是该图像中的目标属于哪个类别(class probability)。而目标检测(object detection)除了要告诉输入图像中包含了哪类目前外,还要框出该目标的具体位置(bounding boxes)。

在目标检测时,为了定位到目标的具体位置,通常会把图像分成许多子块(sub-regions / patches),然后把子块作为输入,送到目标识别的模型中。分子块的最直接方法叫滑动窗口法(sliding window approach)。滑动窗口的方法就是按照子块的大小在整幅图像上穷举所有子图像块。这种方法产生的数据量想想都头大。和滑动窗口法相对的是另外一类基于区域(region proposal)的方法。selective search就是其中之一!

二、selective search算法流程

选择性搜索(selective search)

step0:生成区域集R,具体参见论文《Efficient Graph-Based Image Segmentation》

step1:计算区域集R里每个相邻区域的相似度S={s1,s2,…}
step2:找出相似度最高的两个区域,将其合并为新集,添加进R
step3:从S中移除所有与step2中有关的子集
step4:计算新集与所有子集的相似度
step5:跳至step2,直至S为空


三、相似度计算

论文考虑了颜色、纹理、尺寸和空间交叠这4个参数。

3.1、颜色相似度(color similarity)
将色彩空间转为HSV,每个通道下以bins=25计算直方图,这样每个区域的颜色直方图有25*3=75个区间。 对直方图除以区域尺寸做归一化后使用下式计算相似度:

选择性搜索(selective search)

3.2、纹理相似度(texture similarity)

论文采用方差为1的高斯分布在8个方向做梯度统计,然后将统计结果(尺寸与区域大小一致)以bins=10计算直方图。直方图区间数为8*3*10=240(使用RGB色彩空间)。

选择性搜索(selective search)

其中,选择性搜索(selective search)是直方图中第选择性搜索(selective search)个bin的值。

3.3、尺寸相似度(size similarity)

选择性搜索(selective search)

保证合并操作的尺度较为均匀,避免一个大区域陆续“吃掉”其他小区域。

例:设有区域a-b-c-d-e-f-g-h。较好的合并方式是:ab-cd-ef-gh -> abcd-efgh -> abcdefgh。 不好的合并方法是:ab-c-d-e-f-g-h ->abcd-e-f-g-h ->abcdef-gh -> abcdefgh。

3.4、交叠相似度(shape compatibility measure)

选择性搜索(selective search)

选择性搜索(selective search)

3.5、最终的相似度

选择性搜索(selective search)


四、OpenCV 3.3 实现了selective search

在OpenCV的contrib模块中实现了selective search算法。类定义为:

cv::ximgproc::segmentation::SelectiveSearchSegmentation


举例:

  1. #include "opencv2/ximgproc/segmentation.hpp"
  2. #include "opencv2/highgui.hpp"
  3. #include "opencv2/core.hpp"
  4. #include "opencv2/imgproc.hpp"
  5. #include <iostream>
  6. #include <ctime>
  7. using namespace cv;
  8. using namespace cv::ximgproc::segmentation;
  9. static void help() {
  10. std::cout << std::endl <<
  11. "Usage:" << std::endl <<
  12. "./ssearch input_image (f|q)" << std::endl <<
  13. "f=fast, q=quality" << std::endl <<
  14. "Use l to display less rects, m to display more rects, q to quit" << std::endl;
  15. }
  16. int main(int argc, char** argv) {
  17. // If image path and f/q is not passed as command
  18. // line arguments, quit and display help message
  19. if (argc < 3) {
  20. help();
  21. return -1;
  22. }
  23. // speed-up using multithreads
  24. // void cv::setUseOptimized(bool onoff), Enables or disables the optimized code.
  25. setUseOptimized(true);
  26. setNumThreads(4);
  27. // read image
  28. Mat im = imread(argv[1]);
  29. // resize image
  30. int newHeight = 200;
  31. int newWidth = im.cols*newHeight/im.rows;
  32. resize(im, im, Size(newWidth, newHeight));
  33. // create Selective Search Segmentation Object using default parameters
  34. Ptr<SelectiveSearchSegmentation> ss = createSelectiveSearchSegmentation();
  35. // set input image on which we will run segmentation
  36. ss->setBaseImage(im);
  37. // Switch to fast but low recall Selective Search method
  38. if (argv[2][0] == 'f') {
  39. ss->switchToSelectiveSearchFast();
  40. }
  41. // Switch to high recall but slow Selective Search method
  42. else if (argv[2][0] == 'q') {
  43. ss->switchToSelectiveSearchQuality();
  44. }
  45. // if argument is neither f nor q print help message
  46. else {
  47. help();
  48. return -2;
  49. }
  50. // run selective search segmentation on input image
  51. std::vector<Rect> rects;
  52. ss->process(rects);
  53. std::cout << "Total Number of Region Proposals: " << rects.size() << std::endl;
  54. // number of region proposals to show
  55. int numShowRects = 100;
  56. // increment to increase/decrease total number of reason proposals to be shown
  57. int increment = 50;
  58. while(1) {
  59. // create a copy of original image
  60. Mat imOut = im.clone();
  61. // itereate over all the region proposals
  62. for(int i = 0; i < rects.size(); i++) {
  63. if (i < numShowRects) {
  64. rectangle(imOut, rects[i], Scalar(0, 255, 0));
  65. }
  66. else {
  67. break;
  68. }
  69. }
  70. // show output
  71. imshow("Output", imOut);
  72. // record key press
  73. int k = waitKey();
  74. // m is pressed
  75. if (k == 109) {
  76. // increase total number of rectangles to show by increment
  77. numShowRects += increment;
  78. }
  79. // l is pressed
  80. else if (k == 108 && numShowRects > increment) {
  81. // decrease total number of rectangles to show by increment
  82. numShowRects -= increment;
  83. }
  84. // q is pressed
  85. else if (k == 113) {
  86. break;
  87. }
  88. }
  89. return 0;
  90. }
上边代码git地址:https://code.csdn.net/guoyunfei20/selective_search_opencv_demo.git(运行需要安装OpenCV3.0以上 + contrib)


目录

1 前言

2 Selective Search算法

3 Python源码分析



  • 1 前言

在目标检测时,为了定位到目标的具体位置,通常会把图像分成许多子块(sub-regions / patches),然后把子块作为输入,送到目标识别的模型中。selective search就是一种选择子块的启发式方法。

选择性搜索(selective search)



  • 2 Selective Search算法

主要思路:输入一张图片,首先通过图像分割的方法(如大名鼎鼎的felzenszwalb算法)获得很多小的区域,然后对这些小的区域不断进行合并,一直到无法合并为止。此时这些原始的小区域和合并得到的区域的就是我们得到的bounding box.

选择性搜索(selective search)

选择性搜索(selective search)

算法分为如下几个大步:

1. 生成原始的区域集R(利用felzenszwalb算法)

2. 计算区域集R里每个相邻区域的相似度S={s1,s2,…} 

3. 找出相似度最高的两个区域,将其合并为新集,添加进R 

4. 从S中移除所有与第3步中有关的子集 

5. 计算新集与所有子集的相似度 

6.跳至第三步,不断循环,合并,直至S为空(到不能再合并时为止)

 


 

  • 3 Python源码分析

Github上有一个选择性搜索的简单实现 -- selectivesearch ,可以帮助大家理解

 

  1. import skimage.data
  2. import selectivesearch
  3. img = skimage.data.astronaut()
  4. img_lbl, regions = selectivesearch.selective_search(img, scale=500, sigma=0.9, min_size=10)
  5. regions[:10]
  6. =>
  7. [{'labels': [0.0], 'rect': (0, 0, 15, 24), 'size': 260},
  8. {'labels': [1.0], 'rect': (13, 0, 1, 12), 'size': 23},
  9. {'labels': [2.0], 'rect': (0, 15, 15, 11), 'size': 30},
  10. {'labels': [3.0], 'rect': (15, 14, 0, 0), 'size': 1},
  11. {'labels': [4.0], 'rect': (0, 0, 61, 153), 'size': 4927},
  12. {'labels': [5.0], 'rect': (0, 12, 61, 142), 'size': 177},
  13. {'labels': [6.0], 'rect': (7, 54, 6, 17), 'size': 8},
  14. {'labels': [7.0], 'rect': (28, 50, 18, 32), 'size': 22},
  15. {'labels': [8.0], 'rect': (2, 99, 7, 24), 'size': 24},
  16. {'labels': [9.0], 'rect': (14, 118, 79, 117), 'size': 4008}]

选择性搜索(selective search)



1. 用户生成原始区域集的函数,其中用到了felzenszwalb图像分割算法。每一个区域都有一个编号,将编号并入图片中,方便后面的操作

  1. def _generate_segments(im_orig, scale, sigma, min_size):
  2. """
  3. segment smallest regions by the algorithm of Felzenswalb and
  4. Huttenlocher
  5. """
  6. # open the Image
  7. im_mask = skimage.segmentation.felzenszwalb(
  8. skimage.util.img_as_float(im_orig), scale=scale, sigma=sigma,
  9. min_size=min_size)
  10. # merge mask channel to the image as a 4th channel
  11. im_orig = numpy.append(
  12. im_orig, numpy.zeros(im_orig.shape[:2])[:, :, numpy.newaxis], axis=2)
  13. im_orig[:, :, 3] = im_mask
  14. return im_orig

2. 计算两个区域的相似度
论文中考虑了四种相似度 -- 颜色,纹理,尺寸,以及交叠。

其中颜色和纹理相似度,通过获取两个区域的直方图的交集,来判断相似度。

最后的相似度是四种相似度的加和。

  1. def _sim_colour(r1, r2):
  2. """
  3. calculate the sum of histogram intersection of colour
  4. """
  5. return sum([min(a, b) for a, b in zip(r1["hist_c"], r2["hist_c"])])
  6. def _sim_texture(r1, r2):
  7. """
  8. calculate the sum of histogram intersection of texture
  9. """
  10. return sum([min(a, b) for a, b in zip(r1["hist_t"], r2["hist_t"])])
  11. def _sim_size(r1, r2, imsize):
  12. """
  13. calculate the size similarity over the image
  14. """
  15. return 1.0 - (r1["size"] + r2["size"]) / imsize
  16. def _sim_fill(r1, r2, imsize):
  17. """
  18. calculate the fill similarity over the image
  19. """
  20. bbsize = (
  21. (max(r1["max_x"], r2["max_x"]) - min(r1["min_x"], r2["min_x"]))
  22. * (max(r1["max_y"], r2["max_y"]) - min(r1["min_y"], r2["min_y"]))
  23. )
  24. return 1.0 - (bbsize - r1["size"] - r2["size"]) / imsize
  25. def _calc_sim(r1, r2, imsize):
  26. return (_sim_colour(r1, r2) + _sim_texture(r1, r2)
  27. + _sim_size(r1, r2, imsize) + _sim_fill(r1, r2, imsize))

3. 用于计算颜色和纹理的直方图的函数

颜色直方图:将色彩空间转为HSV,每个通道下以bins=25计算直方图,这样每个区域的颜色直方图有25*3=75个区间。 对直方图除以区域尺寸做归一化后使用下式计算相似度:

选择性搜索(selective search)

纹理相似度:论文采用方差为1的高斯分布在8个方向做梯度统计,然后将统计结果(尺寸与区域大小一致)以bins=10计算直方图。直方图区间数为8*3*10=240(使用RGB色彩空间)。这里是用了LBP(local binary pattern)获取纹理特征,建立直方图,其余相同

选择性搜索(selective search)

其中,选择性搜索(selective search)是直方图中第选择性搜索(selective search)个bin的值。

  1. def _calc_colour_hist(img):
  2. """
  3. calculate colour histogram for each region
  4. the size of output histogram will be BINS * COLOUR_CHANNELS(3)
  5. number of bins is 25 as same as [uijlings_ijcv2013_draft.pdf]
  6. extract HSV
  7. """
  8. BINS = 25
  9. hist = numpy.array([])
  10. for colour_channel in (0, 1, 2):
  11. # extracting one colour channel
  12. c = img[:, colour_channel]
  13. # calculate histogram for each colour and join to the result
  14. hist = numpy.concatenate(
  15. [hist] + [numpy.histogram(c, BINS, (0.0, 255.0))[0]])
  16. # L1 normalize
  17. hist = hist / len(img)
  18. return hist
  19. def _calc_texture_gradient(img):
  20. """
  21. calculate texture gradient for entire image
  22. The original SelectiveSearch algorithm proposed Gaussian derivative
  23. for 8 orientations, but we use LBP instead.
  24. output will be [height(*)][width(*)]
  25. """
  26. ret = numpy.zeros((img.shape[0], img.shape[1], img.shape[2]))
  27. for colour_channel in (0, 1, 2):
  28. ret[:, :, colour_channel] = skimage.feature.local_binary_pattern(
  29. img[:, :, colour_channel], 8, 1.0)
  30. return ret
  31. def _calc_texture_hist(img):
  32. """
  33. calculate texture histogram for each region
  34. calculate the histogram of gradient for each colours
  35. the size of output histogram will be
  36. BINS * ORIENTATIONS * COLOUR_CHANNELS(3)
  37. """
  38. BINS = 10
  39. hist = numpy.array([])
  40. for colour_channel in (0, 1, 2):
  41. # mask by the colour channel
  42. fd = img[:, colour_channel]
  43. # calculate histogram for each orientation and concatenate them all
  44. # and join to the result
  45. hist = numpy.concatenate(
  46. [hist] + [numpy.histogram(fd, BINS, (0.0, 1.0))[0]])
  47. # L1 Normalize
  48. hist = hist / len(img)
  49. return hist

4. 提取区域的尺寸,颜色和纹理特征

  1. def _extract_regions(img):
  2. R = {}
  3. # get hsv image
  4. hsv = skimage.color.rgb2hsv(img[:, :, :3])
  5. # pass 1: count pixel positions
  6. for y, i in enumerate(img):
  7. for x, (r, g, b, l) in enumerate(i):
  8. # initialize a new region
  9. if l not in R:
  10. R[l] = {
  11. "min_x": 0xffff, "min_y": 0xffff,
  12. "max_x": 0, "max_y": 0, "labels": [l]}
  13. # bounding box
  14. if R[l]["min_x"] > x:
  15. R[l]["min_x"] = x
  16. if R[l]["min_y"] > y:
  17. R[l]["min_y"] = y
  18. if R[l]["max_x"] < x:
  19. R[l]["max_x"] = x
  20. if R[l]["max_y"] < y:
  21. R[l]["max_y"] = y
  22. # pass 2: calculate texture gradient
  23. tex_grad = _calc_texture_gradient(img)
  24. # pass 3: calculate colour histogram of each region
  25. for k, v in list(R.items()):
  26. # colour histogram
  27. masked_pixels = hsv[:, :, :][img[:, :, 3] == k]
  28. R[k]["size"] = len(masked_pixels / 4)
  29. R[k]["hist_c"] = _calc_colour_hist(masked_pixels)
  30. # texture histogram
  31. R[k]["hist_t"] = _calc_texture_hist(tex_grad[:, :][img[:, :, 3] == k])
  32. return R

5. 找邻居 -- 通过计算每个区域与其余的所有区域是否有相交,来判断是不是邻居

  1. def _extract_neighbours(regions):
  2. def intersect(a, b):
  3. if (a["min_x"] < b["min_x"] < a["max_x"]
  4. and a["min_y"] < b["min_y"] < a["max_y"]) or (
  5. a["min_x"] < b["max_x"] < a["max_x"]
  6. and a["min_y"] < b["max_y"] < a["max_y"]) or (
  7. a["min_x"] < b["min_x"] < a["max_x"]
  8. and a["min_y"] < b["max_y"] < a["max_y"]) or (
  9. a["min_x"] < b["max_x"] < a["max_x"]
  10. and a["min_y"] < b["min_y"] < a["max_y"]):
  11. return True
  12. return False
  13. R = list(regions.items())
  14. neighbours = []
  15. for cur, a in enumerate(R[:-1]):
  16. for b in R[cur + 1:]:
  17. if intersect(a[1], b[1]):
  18. neighbours.append((a, b))
  19. return neighbours

6. 合并两个区域的函数

  1. def _merge_regions(r1, r2):
  2. new_size = r1["size"] + r2["size"]
  3. rt = {
  4. "min_x": min(r1["min_x"], r2["min_x"]),
  5. "min_y": min(r1["min_y"], r2["min_y"]),
  6. "max_x": max(r1["max_x"], r2["max_x"]),
  7. "max_y": max(r1["max_y"], r2["max_y"]),
  8. "size": new_size,
  9. "hist_c": (
  10. r1["hist_c"] * r1["size"] + r2["hist_c"] * r2["size"]) / new_size,
  11. "hist_t": (
  12. r1["hist_t"] * r1["size"] + r2["hist_t"] * r2["size"]) / new_size,
  13. "labels": r1["labels"] + r2["labels"]
  14. }
  15. return rt

7. 主函数 -- Selective Search

scale:图像分割的集群程度。值越大,意味集群程度越高,分割的越少,获得子区域越大。默认为1

signa: 图像分割前,会先对原图像进行高斯滤波去噪,sigma即为高斯核的大小。默认为0.8

min_size  : 最小的区域像素点个数。当小于此值时,图像分割的计算就停止,默认为20

每次选出相似度最高的一组区域(如编号为100和120的区域),进行合并,得到新的区域(如编号为300)。然后计算新的区域300与区域100的所有邻居和区域120的所有邻居的相似度,加入区域集S。不断循环,知道S为空,此时最后只剩下一个区域,而且它的像素数会非常大,接近原始图片的像素数,因此无法继续合并。最后退出程序。

  1. def selective_search(
  2. im_orig, scale=1.0, sigma=0.8, min_size=50):
  3. '''Selective Search
  4. Parameters
  5. ----------
  6. im_orig : ndarray
  7. Input image
  8. scale : int
  9. Free parameter. Higher means larger clusters in felzenszwalb segmentation.
  10. sigma : float
  11. Width of Gaussian kernel for felzenszwalb segmentation.
  12. min_size : int
  13. Minimum component size for felzenszwalb segmentation.
  14. Returns
  15. -------
  16. img : ndarray
  17. image with region label
  18. region label is stored in the 4th value of each pixel [r,g,b,(region)]
  19. regions : array of dict
  20. [
  21. {
  22. 'rect': (left, top, width, height),
  23. 'labels': [...],
  24. 'size': component_size
  25. },
  26. ...
  27. ]
  28. '''
  29. assert im_orig.shape[2] == 3, "3ch image is expected"
  30. # load image and get smallest regions
  31. # region label is stored in the 4th value of each pixel [r,g,b,(region)]
  32. img = _generate_segments(im_orig, scale, sigma, min_size)
  33. if img is None:
  34. return None, {}
  35. imsize = img.shape[0] * img.shape[1]
  36. R = _extract_regions(img)
  37. # extract neighbouring information
  38. neighbours = _extract_neighbours(R)
  39. # calculate initial similarities
  40. S = {}
  41. for (ai, ar), (bi, br) in neighbours:
  42. S[(ai, bi)] = _calc_sim(ar, br, imsize)
  43. # hierarchal search
  44. while S != {}:
  45. # get highest similarity
  46. i, j = sorted(S.items(), key=lambda i: i[1])[-1][0]
  47. # merge corresponding regions
  48. t = max(R.keys()) + 1.0
  49. R[t] = _merge_regions(R[i], R[j])
  50. # mark similarities for regions to be removed
  51. key_to_delete = []
  52. for k, v in list(S.items()):
  53. if (i in k) or (j in k):
  54. key_to_delete.append(k)
  55. # remove old similarities of related regions
  56. for k in key_to_delete:
  57. del S[k]
  58. # calculate similarity set with the new region
  59. for k in [a for a in key_to_delete if a != (i, j)]:
  60. n = k[1] if k[0] in (i, j) else k[0]
  61. S[(t, n)] = _calc_sim(R[t], R[n], imsize)
  62. regions = []
  63. for k, r in list(R.items()):
  64. regions.append({
  65. 'rect': (
  66. r['min_x'], r['min_y'],
  67. r['max_x'] - r['min_x'], r['max_y'] - r['min_y']),
  68. 'size': r['size'],
  69. 'labels': r['labels']
  70. })
  71. return img, regions