选择性搜索(selective search)
https://blog.csdn.net/guoyunfei20/article/details/78723646
https://blog.csdn.net/Tomxiaodai/article/details/81412354
https://blog.csdn.net/yuanlulu/article/details/82157071
该文翻译整理自:selective search for object detection(c++ / python)
一、目标检测 VS 目标识别
目标识别(objec recognition)是指明一幅输入图像中包含那类目标。其输入为一幅图像,输出是该图像中的目标属于哪个类别(class probability)。而目标检测(object detection)除了要告诉输入图像中包含了哪类目前外,还要框出该目标的具体位置(bounding boxes)。
在目标检测时,为了定位到目标的具体位置,通常会把图像分成许多子块(sub-regions / patches),然后把子块作为输入,送到目标识别的模型中。分子块的最直接方法叫滑动窗口法(sliding window approach)。滑动窗口的方法就是按照子块的大小在整幅图像上穷举所有子图像块。这种方法产生的数据量想想都头大。和滑动窗口法相对的是另外一类基于区域(region proposal)的方法。selective search就是其中之一!
二、selective search算法流程
step0:生成区域集R,具体参见论文《Efficient Graph-Based Image Segmentation》
step1:计算区域集R里每个相邻区域的相似度S={s1,s2,…}
step2:找出相似度最高的两个区域,将其合并为新集,添加进R
step3:从S中移除所有与step2中有关的子集
step4:计算新集与所有子集的相似度
step5:跳至step2,直至S为空
三、相似度计算
论文考虑了颜色、纹理、尺寸和空间交叠这4个参数。
3.1、颜色相似度(color similarity)
将色彩空间转为HSV,每个通道下以bins=25计算直方图,这样每个区域的颜色直方图有25*3=75个区间。 对直方图除以区域尺寸做归一化后使用下式计算相似度:
3.2、纹理相似度(texture similarity)
论文采用方差为1的高斯分布在8个方向做梯度统计,然后将统计结果(尺寸与区域大小一致)以bins=10计算直方图。直方图区间数为8*3*10=240(使用RGB色彩空间)。
其中,是直方图中第个bin的值。
3.3、尺寸相似度(size similarity)
保证合并操作的尺度较为均匀,避免一个大区域陆续“吃掉”其他小区域。
例:设有区域a-b-c-d-e-f-g-h。较好的合并方式是:ab-cd-ef-gh -> abcd-efgh -> abcdefgh。 不好的合并方法是:ab-c-d-e-f-g-h ->abcd-e-f-g-h ->abcdef-gh -> abcdefgh。
3.4、交叠相似度(shape compatibility measure)
3.5、最终的相似度
四、OpenCV 3.3 实现了selective search
在OpenCV的contrib模块中实现了selective search算法。类定义为:
cv::ximgproc::segmentation::SelectiveSearchSegmentation
-
#include "opencv2/ximgproc/segmentation.hpp"
-
#include "opencv2/highgui.hpp"
-
#include "opencv2/core.hpp"
-
#include "opencv2/imgproc.hpp"
-
#include <iostream>
-
#include <ctime>
-
-
using namespace cv;
-
using namespace cv::ximgproc::segmentation;
-
-
static void help() {
-
std::cout << std::endl <<
-
"Usage:" << std::endl <<
-
"./ssearch input_image (f|q)" << std::endl <<
-
"f=fast, q=quality" << std::endl <<
-
"Use l to display less rects, m to display more rects, q to quit" << std::endl;
-
}
-
-
-
int main(int argc, char** argv) {
-
// If image path and f/q is not passed as command
-
// line arguments, quit and display help message
-
if (argc < 3) {
-
help();
-
return -1;
-
}
-
-
// speed-up using multithreads
-
// void cv::setUseOptimized(bool onoff), Enables or disables the optimized code.
-
setUseOptimized(true);
-
setNumThreads(4);
-
-
// read image
-
Mat im = imread(argv[1]);
-
// resize image
-
int newHeight = 200;
-
int newWidth = im.cols*newHeight/im.rows;
-
resize(im, im, Size(newWidth, newHeight));
-
-
// create Selective Search Segmentation Object using default parameters
-
Ptr<SelectiveSearchSegmentation> ss = createSelectiveSearchSegmentation();
-
// set input image on which we will run segmentation
-
ss->setBaseImage(im);
-
-
// Switch to fast but low recall Selective Search method
-
if (argv[2][0] == 'f') {
-
ss->switchToSelectiveSearchFast();
-
}
-
// Switch to high recall but slow Selective Search method
-
else if (argv[2][0] == 'q') {
-
ss->switchToSelectiveSearchQuality();
-
}
-
// if argument is neither f nor q print help message
-
else {
-
help();
-
return -2;
-
}
-
-
// run selective search segmentation on input image
-
std::vector<Rect> rects;
-
ss->process(rects);
-
std::cout << "Total Number of Region Proposals: " << rects.size() << std::endl;
-
-
// number of region proposals to show
-
int numShowRects = 100;
-
// increment to increase/decrease total number of reason proposals to be shown
-
int increment = 50;
-
-
while(1) {
-
// create a copy of original image
-
Mat imOut = im.clone();
-
-
// itereate over all the region proposals
-
for(int i = 0; i < rects.size(); i++) {
-
if (i < numShowRects) {
-
rectangle(imOut, rects[i], Scalar(0, 255, 0));
-
}
-
else {
-
break;
-
}
-
}
-
-
// show output
-
imshow("Output", imOut);
-
-
// record key press
-
int k = waitKey();
-
-
// m is pressed
-
if (k == 109) {
-
// increase total number of rectangles to show by increment
-
numShowRects += increment;
-
}
-
// l is pressed
-
else if (k == 108 && numShowRects > increment) {
-
// decrease total number of rectangles to show by increment
-
numShowRects -= increment;
-
}
-
// q is pressed
-
else if (k == 113) {
-
break;
-
}
-
}
-
return 0;
-
}
上边代码git地址:https://code.csdn.net/guoyunfei20/selective_search_opencv_demo.git(运行需要安装OpenCV3.0以上 + contrib)目录
在目标检测时,为了定位到目标的具体位置,通常会把图像分成许多子块(sub-regions / patches),然后把子块作为输入,送到目标识别的模型中。selective search就是一种选择子块的启发式方法。
主要思路:输入一张图片,首先通过图像分割的方法(如大名鼎鼎的felzenszwalb算法)获得很多小的区域,然后对这些小的区域不断进行合并,一直到无法合并为止。此时这些原始的小区域和合并得到的区域的就是我们得到的bounding box.
算法分为如下几个大步:
1. 生成原始的区域集R(利用felzenszwalb算法)
2. 计算区域集R里每个相邻区域的相似度S={s1,s2,…}
3. 找出相似度最高的两个区域,将其合并为新集,添加进R
4. 从S中移除所有与第3步中有关的子集
5. 计算新集与所有子集的相似度
6.跳至第三步,不断循环,合并,直至S为空(到不能再合并时为止)
Github上有一个选择性搜索的简单实现 -- selectivesearch ,可以帮助大家理解
-
import skimage.data
-
import selectivesearch
-
-
img = skimage.data.astronaut()
-
img_lbl, regions = selectivesearch.selective_search(img, scale=500, sigma=0.9, min_size=10)
-
regions[:10]
-
=>
-
[{'labels': [0.0], 'rect': (0, 0, 15, 24), 'size': 260},
-
{'labels': [1.0], 'rect': (13, 0, 1, 12), 'size': 23},
-
{'labels': [2.0], 'rect': (0, 15, 15, 11), 'size': 30},
-
{'labels': [3.0], 'rect': (15, 14, 0, 0), 'size': 1},
-
{'labels': [4.0], 'rect': (0, 0, 61, 153), 'size': 4927},
-
{'labels': [5.0], 'rect': (0, 12, 61, 142), 'size': 177},
-
{'labels': [6.0], 'rect': (7, 54, 6, 17), 'size': 8},
-
{'labels': [7.0], 'rect': (28, 50, 18, 32), 'size': 22},
-
{'labels': [8.0], 'rect': (2, 99, 7, 24), 'size': 24},
-
{'labels': [9.0], 'rect': (14, 118, 79, 117), 'size': 4008}]
1. 用户生成原始区域集的函数,其中用到了felzenszwalb图像分割算法。每一个区域都有一个编号,将编号并入图片中,方便后面的操作
-
def _generate_segments(im_orig, scale, sigma, min_size):
-
"""
-
segment smallest regions by the algorithm of Felzenswalb and
-
Huttenlocher
-
"""
-
-
# open the Image
-
im_mask = skimage.segmentation.felzenszwalb(
-
skimage.util.img_as_float(im_orig), scale=scale, sigma=sigma,
-
min_size=min_size)
-
-
# merge mask channel to the image as a 4th channel
-
im_orig = numpy.append(
-
im_orig, numpy.zeros(im_orig.shape[:2])[:, :, numpy.newaxis], axis=2)
-
im_orig[:, :, 3] = im_mask
-
-
return im_orig
2. 计算两个区域的相似度
论文中考虑了四种相似度 -- 颜色,纹理,尺寸,以及交叠。
其中颜色和纹理相似度,通过获取两个区域的直方图的交集,来判断相似度。
最后的相似度是四种相似度的加和。
-
def _sim_colour(r1, r2):
-
"""
-
calculate the sum of histogram intersection of colour
-
"""
-
return sum([min(a, b) for a, b in zip(r1["hist_c"], r2["hist_c"])])
-
-
-
def _sim_texture(r1, r2):
-
"""
-
calculate the sum of histogram intersection of texture
-
"""
-
return sum([min(a, b) for a, b in zip(r1["hist_t"], r2["hist_t"])])
-
-
-
def _sim_size(r1, r2, imsize):
-
"""
-
calculate the size similarity over the image
-
"""
-
return 1.0 - (r1["size"] + r2["size"]) / imsize
-
-
-
def _sim_fill(r1, r2, imsize):
-
"""
-
calculate the fill similarity over the image
-
"""
-
bbsize = (
-
(max(r1["max_x"], r2["max_x"]) - min(r1["min_x"], r2["min_x"]))
-
* (max(r1["max_y"], r2["max_y"]) - min(r1["min_y"], r2["min_y"]))
-
)
-
return 1.0 - (bbsize - r1["size"] - r2["size"]) / imsize
-
-
-
def _calc_sim(r1, r2, imsize):
-
return (_sim_colour(r1, r2) + _sim_texture(r1, r2)
-
+ _sim_size(r1, r2, imsize) + _sim_fill(r1, r2, imsize))
3. 用于计算颜色和纹理的直方图的函数
颜色直方图:将色彩空间转为HSV,每个通道下以bins=25计算直方图,这样每个区域的颜色直方图有25*3=75个区间。 对直方图除以区域尺寸做归一化后使用下式计算相似度:
纹理相似度:论文采用方差为1的高斯分布在8个方向做梯度统计,然后将统计结果(尺寸与区域大小一致)以bins=10计算直方图。直方图区间数为8*3*10=240(使用RGB色彩空间)。这里是用了LBP(local binary pattern)获取纹理特征,建立直方图,其余相同
其中,是直方图中第个bin的值。
-
def _calc_colour_hist(img):
-
"""
-
calculate colour histogram for each region
-
the size of output histogram will be BINS * COLOUR_CHANNELS(3)
-
number of bins is 25 as same as [uijlings_ijcv2013_draft.pdf]
-
extract HSV
-
"""
-
-
BINS = 25
-
hist = numpy.array([])
-
-
for colour_channel in (0, 1, 2):
-
-
# extracting one colour channel
-
c = img[:, colour_channel]
-
-
# calculate histogram for each colour and join to the result
-
hist = numpy.concatenate(
-
[hist] + [numpy.histogram(c, BINS, (0.0, 255.0))[0]])
-
-
# L1 normalize
-
hist = hist / len(img)
-
-
return hist
-
-
-
def _calc_texture_gradient(img):
-
"""
-
calculate texture gradient for entire image
-
The original SelectiveSearch algorithm proposed Gaussian derivative
-
for 8 orientations, but we use LBP instead.
-
output will be [height(*)][width(*)]
-
"""
-
ret = numpy.zeros((img.shape[0], img.shape[1], img.shape[2]))
-
-
for colour_channel in (0, 1, 2):
-
ret[:, :, colour_channel] = skimage.feature.local_binary_pattern(
-
img[:, :, colour_channel], 8, 1.0)
-
-
return ret
-
-
-
def _calc_texture_hist(img):
-
"""
-
calculate texture histogram for each region
-
calculate the histogram of gradient for each colours
-
the size of output histogram will be
-
BINS * ORIENTATIONS * COLOUR_CHANNELS(3)
-
"""
-
BINS = 10
-
-
hist = numpy.array([])
-
-
for colour_channel in (0, 1, 2):
-
-
# mask by the colour channel
-
fd = img[:, colour_channel]
-
-
# calculate histogram for each orientation and concatenate them all
-
# and join to the result
-
hist = numpy.concatenate(
-
[hist] + [numpy.histogram(fd, BINS, (0.0, 1.0))[0]])
-
-
# L1 Normalize
-
hist = hist / len(img)
-
-
return hist
-
4. 提取区域的尺寸,颜色和纹理特征
-
def _extract_regions(img):
-
-
R = {}
-
-
# get hsv image
-
hsv = skimage.color.rgb2hsv(img[:, :, :3])
-
-
# pass 1: count pixel positions
-
for y, i in enumerate(img):
-
-
for x, (r, g, b, l) in enumerate(i):
-
-
# initialize a new region
-
if l not in R:
-
R[l] = {
-
"min_x": 0xffff, "min_y": 0xffff,
-
"max_x": 0, "max_y": 0, "labels": [l]}
-
-
# bounding box
-
if R[l]["min_x"] > x:
-
R[l]["min_x"] = x
-
if R[l]["min_y"] > y:
-
R[l]["min_y"] = y
-
if R[l]["max_x"] < x:
-
R[l]["max_x"] = x
-
if R[l]["max_y"] < y:
-
R[l]["max_y"] = y
-
-
# pass 2: calculate texture gradient
-
tex_grad = _calc_texture_gradient(img)
-
-
# pass 3: calculate colour histogram of each region
-
for k, v in list(R.items()):
-
-
# colour histogram
-
masked_pixels = hsv[:, :, :][img[:, :, 3] == k]
-
R[k]["size"] = len(masked_pixels / 4)
-
R[k]["hist_c"] = _calc_colour_hist(masked_pixels)
-
-
# texture histogram
-
R[k]["hist_t"] = _calc_texture_hist(tex_grad[:, :][img[:, :, 3] == k])
-
-
return R
5. 找邻居 -- 通过计算每个区域与其余的所有区域是否有相交,来判断是不是邻居
-
def _extract_neighbours(regions):
-
-
def intersect(a, b):
-
if (a["min_x"] < b["min_x"] < a["max_x"]
-
and a["min_y"] < b["min_y"] < a["max_y"]) or (
-
a["min_x"] < b["max_x"] < a["max_x"]
-
and a["min_y"] < b["max_y"] < a["max_y"]) or (
-
a["min_x"] < b["min_x"] < a["max_x"]
-
and a["min_y"] < b["max_y"] < a["max_y"]) or (
-
a["min_x"] < b["max_x"] < a["max_x"]
-
and a["min_y"] < b["min_y"] < a["max_y"]):
-
return True
-
return False
-
-
R = list(regions.items())
-
neighbours = []
-
for cur, a in enumerate(R[:-1]):
-
for b in R[cur + 1:]:
-
if intersect(a[1], b[1]):
-
neighbours.append((a, b))
-
-
return neighbours
6. 合并两个区域的函数
-
def _merge_regions(r1, r2):
-
new_size = r1["size"] + r2["size"]
-
rt = {
-
"min_x": min(r1["min_x"], r2["min_x"]),
-
"min_y": min(r1["min_y"], r2["min_y"]),
-
"max_x": max(r1["max_x"], r2["max_x"]),
-
"max_y": max(r1["max_y"], r2["max_y"]),
-
"size": new_size,
-
"hist_c": (
-
r1["hist_c"] * r1["size"] + r2["hist_c"] * r2["size"]) / new_size,
-
"hist_t": (
-
r1["hist_t"] * r1["size"] + r2["hist_t"] * r2["size"]) / new_size,
-
"labels": r1["labels"] + r2["labels"]
-
}
-
return rt
7. 主函数 -- Selective Search
scale:图像分割的集群程度。值越大,意味集群程度越高,分割的越少,获得子区域越大。默认为1
signa: 图像分割前,会先对原图像进行高斯滤波去噪,sigma即为高斯核的大小。默认为0.8
min_size : 最小的区域像素点个数。当小于此值时,图像分割的计算就停止,默认为20
每次选出相似度最高的一组区域(如编号为100和120的区域),进行合并,得到新的区域(如编号为300)。然后计算新的区域300与区域100的所有邻居和区域120的所有邻居的相似度,加入区域集S。不断循环,知道S为空,此时最后只剩下一个区域,而且它的像素数会非常大,接近原始图片的像素数,因此无法继续合并。最后退出程序。
-
def selective_search(
-
im_orig, scale=1.0, sigma=0.8, min_size=50):
-
'''Selective Search
-
Parameters
-
----------
-
im_orig : ndarray
-
Input image
-
scale : int
-
Free parameter. Higher means larger clusters in felzenszwalb segmentation.
-
sigma : float
-
Width of Gaussian kernel for felzenszwalb segmentation.
-
min_size : int
-
Minimum component size for felzenszwalb segmentation.
-
Returns
-
-------
-
img : ndarray
-
image with region label
-
region label is stored in the 4th value of each pixel [r,g,b,(region)]
-
regions : array of dict
-
[
-
{
-
'rect': (left, top, width, height),
-
'labels': [...],
-
'size': component_size
-
},
-
...
-
]
-
'''
-
assert im_orig.shape[2] == 3, "3ch image is expected"
-
-
# load image and get smallest regions
-
# region label is stored in the 4th value of each pixel [r,g,b,(region)]
-
img = _generate_segments(im_orig, scale, sigma, min_size)
-
-
if img is None:
-
return None, {}
-
-
imsize = img.shape[0] * img.shape[1]
-
R = _extract_regions(img)
-
-
# extract neighbouring information
-
neighbours = _extract_neighbours(R)
-
-
# calculate initial similarities
-
S = {}
-
for (ai, ar), (bi, br) in neighbours:
-
S[(ai, bi)] = _calc_sim(ar, br, imsize)
-
-
# hierarchal search
-
while S != {}:
-
-
# get highest similarity
-
i, j = sorted(S.items(), key=lambda i: i[1])[-1][0]
-
-
# merge corresponding regions
-
t = max(R.keys()) + 1.0
-
R[t] = _merge_regions(R[i], R[j])
-
-
# mark similarities for regions to be removed
-
key_to_delete = []
-
for k, v in list(S.items()):
-
if (i in k) or (j in k):
-
key_to_delete.append(k)
-
-
# remove old similarities of related regions
-
for k in key_to_delete:
-
del S[k]
-
-
# calculate similarity set with the new region
-
for k in [a for a in key_to_delete if a != (i, j)]:
-
n = k[1] if k[0] in (i, j) else k[0]
-
S[(t, n)] = _calc_sim(R[t], R[n], imsize)
-
-
regions = []
-
for k, r in list(R.items()):
-
regions.append({
-
'rect': (
-
r['min_x'], r['min_y'],
-
r['max_x'] - r['min_x'], r['max_y'] - r['min_y']),
-
'size': r['size'],
-
'labels': r['labels']
-
})
-
-
return img, regions
推荐阅读
-
荐 Java刷题笔记15:不同的二叉搜索树( Unique Binary Search Trees)
-
使用 Python 中 re 模块对测试用例参数化,进行搜索 search、替换 sub
-
【leetcode】-700. Search in a Binary Search Tree 查找二叉搜索树
-
LeetCode 235--二叉搜索树的最近公共祖先 ( Lowest Common Ancestor of a Binary Search Tree ) ( C语言版 )
-
LeetCode 235. Lowest Common Ancestor of a Binary Search Tree 二叉搜索树
-
二叉搜索树 (binary search tree)
-
python使用正则表达式的search()函数实现指定位置搜索功能
-
利用Yahoo! Search API开发自已的搜索引擎-php版
-
Elastic Search搜索实例
-
Elasticsearch(五)----search搜索详解