Basic introduction
YOLO
Yolo algorithm does not use a sliding window. Yolo divides the input image into SXS cells, and each cell is responsible for detecting objects’ falling into the cell. If the coordinates of the center of an object fall into a grid, the grid is responsible for detecting the object. As shown in the figure below, the center point (red origin) of the object dog falls into the grid in row 5 and column 2, so this grid is responsible for predicting the object dog in the image
First, the target location is to divide the output y into PBX by BW BH C1 C2 and so on, where BX by bwbh is to find the midpoint of the bounding boxes
At this time, we need to pay attention to a quantity that is the ratio of intersection and union
At this time, we need to pay attention to a quantity that is the ratio of intersection and union. We use IOU to express the ratio of intersection and union
When the accuracy we pursue is relatively strict, we will define the IOU standard higher, for example, IOU > 0.7
When we get rid of the low ratio of intersection and union, there will be many similar prediction frameworks, so we can’t highlight them all, so we use the maximum inhibition value
However, a lattice does not have only one prediction object, so we introduce anchor boxes
Faster Rcnn
Unlike Yolo, fast RCNN uses convolution based sliding windows
The first layer, like CNN, generally uses classic feature extraction networks, such as RESNET VGG, etc.In this way, we can get a feature extraction network, feature maps
One of the increasing accuracy rates of fast r-cnn is the introduction of proposal
The prior region mainly uses 18 channel convolution of 1 * 1 and 36 channel convolution of 1 * 1, so as to realize the approximate calibration of a candidate region
Finally, based on feature maps and through roipooling, we get the target location
![在这里插入图片描述](https://img-blog.csdnimg.cn/20200509151727833.png
Here’s a concept network for classification
Two cascaded 3 * 3 filters are proposed to replace a 5 * 5 filter, which saves (5 * 5) / (2 * 3 * 3) = 1.39 computation
The source code is as follows
class InceptionV3ModuleE(nn.Module):
def __init__(self, in_channels, out_channels1reduce,out_channels1, out_channels2reduce, out_channels2):
super(InceptionV3ModuleE, self).__init__()
self.branch1 = nn.Sequential(
ConvBNReLU(in_channels=in_channels, out_channels=out_channels1reduce, kernel_size=1),
ConvBNReLU(in_channels=out_channels1reduce, out_channels=out_channels1, kernel_size=3, stride=2),
)
self.branch2 = nn.Sequential(
ConvBNReLU(in_channels=in_channels, out_channels=out_channels2reduce, kernel_size=1),
ConvBNReLUFactorization(in_channels=out_channels2reduce, out_channels=out_channels2reduce,kernel_sizes=[1, 7], paddings=[0, 3]),
ConvBNReLUFactorization(in_channels=out_channels2reduce, out_channels=out_channels2reduce,kernel_sizes=[7, 1], paddings=[3, 0]),
ConvBNReLU(in_channels=out_channels2reduce, out_channels=out_channels2, kernel_size=3, stride=2),
)
self.branch3 = nn.MaxPool2d(kernel_size=3, stride=2)
def forward(self, x):
out1 = self.branch1(x)
out2 = self.branch2(x)
out3 = self.branch3(x)
out = torch.cat([out1, out2, out3], dim=1)
return out
class InceptionAux(nn.Module):
def __init__(self, in_channels,out_channels):
super(InceptionAux, self).__init__()
self.auxiliary_avgpool = nn.AvgPool2d(kernel_size=5, stride=3)
self.auxiliary_conv1 = ConvBNReLU(in_channels=in_channels, out_channels=128, kernel_size=1)
self.auxiliary_conv2 = nn.Conv2d(in_channels=128, out_channels=768, kernel_size=5,stride=1)
self.auxiliary_dropout = nn.Dropout(p=0.7)
self.auxiliary_linear1 = nn.Linear(in_features=768, out_features=out_channels)
def forward(self, x):
x = self.auxiliary_conv1(self.auxiliary_avgpool(x))
x = self.auxiliary_conv2(x)
x = x.view(x.size(0), -1)
out = self.auxiliary_linear1(self.auxiliary_dropout(x))
return out
class InceptionV3(nn.Module):
def __init__(self, num_classes=1000, stage='train'):
super(InceptionV3, self).__init__()
self.stage = stage
self.block1 = nn.Sequential(
ConvBNReLU(in_channels=3, out_channels=32, kernel_size=3, stride=2),
ConvBNReLU(in_channels=32, out_channels=32, kernel_size=3, stride=1),
ConvBNReLU(in_channels=32, out_channels=64, kernel_size=3, stride=1, padding=1),
nn.MaxPool2d(kernel_size=3, stride=2)
)
self.block2 = nn.Sequential(
ConvBNReLU(in_channels=64, out_channels=80, kernel_size=3, stride=1),
ConvBNReLU(in_channels=80, out_channels=192, kernel_size=3, stride=1, padding=1),
nn.MaxPool2d(kernel_size=3, stride=2)
)
self.block3 = nn.Sequential(
InceptionV3ModuleA(in_channels=192, out_channels1=64,out_channels2reduce=48, out_channels2=64, out_channels3reduce=64, out_channels3=96, out_channels4=32),
InceptionV3ModuleA(in_channels=256, out_channels1=64,out_channels2reduce=48, out_channels2=64, out_channels3reduce=64, out_channels3=96, out_channels4=64),
InceptionV3ModuleA(in_channels=288, out_channels1=64,out_channels2reduce=48, out_channels2=64, out_channels3reduce=64, out_channels3=96, out_channels4=64)
)
self.block4 = nn.Sequential(
InceptionV3ModuleD(in_channels=288, out_channels1reduce=384,out_channels1=384,out_channels2reduce=64, out_channels2=96),
InceptionV3ModuleB(in_channels=768, out_channels1=192, out_channels2reduce=128, out_channels2=192, out_channels3reduce=128,out_channels3=192, out_channels4=192),
InceptionV3ModuleB(in_channels=768, out_channels1=192, out_channels2reduce=160, out_channels2=192,out_channels3reduce=160, out_channels3=192, out_channels4=192),
InceptionV3ModuleB(in_channels=768, out_channels1=192, out_channels2reduce=160, out_channels2=192,out_channels3reduce=160, out_channels3=192, out_channels4=192),
InceptionV3ModuleB(in_channels=768, out_channels1=192, out_channels2reduce=192, out_channels2=192,out_channels3reduce=192, out_channels3=192, out_channels4=192),
)
if self.stage=='train':
self.aux_logits = InceptionAux(in_channels=768,out_channels=num_classes)
self.block5 = nn.Sequential(
InceptionV3ModuleE(in_channels=768, out_channels1reduce=192,out_channels1=320, out_channels2reduce=192, out_channels2=192),
InceptionV3ModuleC(in_channels=1280, out_channels1=320, out_channels2reduce=384, out_channels2=384, out_channels3reduce=448,out_channels3=384, out_channels4=192),
InceptionV3ModuleC(in_channels=2048, out_channels1=320, out_channels2reduce=384, out_channels2=384,out_channels3reduce=448, out_channels3=384, out_channels4=192),
)
self.max_pool = nn.MaxPool2d(kernel_size=8,stride=1)
self.dropout = nn.Dropout(p=0.5)
self.linear = nn.Linear(2048, num_classes)
def forward(self, x):
x = self.block1(x)
x = self.block2(x)
x = self.block3(x)
aux = x = self.block4(x)
x = self.block5(x)
x = self.max_pool(x)
x = self.dropout(x)
x = x.view(x.size(0),-1)
out = self.linear(x)
if self.stage == 'train':
aux = self.aux_logits(aux)
return aux,out
else:
return out
上一篇: Python编码问题
下一篇: Arrays Introduction
推荐阅读
-
详解 Kotlin Reference Basic Types, String, Array and Imports
-
开机自检后出现No ROM Basic System Halted后死机的解决方法
-
.NET Core ASP.NET Core Basic 1-2 控制反转与依赖注入
-
Yii basic 模板支持连接多数据库
-
C++Primer 5th Chap2 Variables and basic Types
-
Review: Basic Knowledge about WebForm
-
PAT (Basic Level) Practice (中文)1001
-
调试 ASP 中使用的 Visual Basic COM 组件
-
Eureka实战-4【开启http basic权限认证】
-
Nginx上配置Basic Authorization登录认服务证的教程