1. Abstract

Faster R-CNN은 Fast R-CNN에 Region Proposal Network(RPN)을 도입
- fully convolutional network(FCN)로 이루어져있으며, object의 bound와 object score를 계산한다.
- RPN의 output과 Fast R-CNN의 결과를 합치는 과정에서 attention mechanism을 사용
- RPN은 image의 어느 부분에 집중해야할지 알려주는 역할
- Region proposal의 속도를 향상

2. Introduction

proposal들이 CNN feature를 공유하는 방식을 통해 속도가 빨라졌지만, Selective Search를 사용하는 Region proposal method는 greedy하게 진행되기 때문에 모델의 다른 부분에 비하면 속도가 많이 느리다는 단점이 존재
- 따라서, region proposal의 속도를 높이기 위하여 RPN이라는 CNN layer를 사용
RPN
- CNN을 통해 region proposal을 생성하는 layer
- 다양한 범위의 scale의 object detection을 위하여 anchor box multiple scale 방식을 사용
- RPN과 Fast R-CNN을 모두 학습시키기 위하여 object detection과 region proposal을 번갈아가면서 학습하기 때문에 빠르게 학습

Object Proposals
Deep Networks for Object Detection

4. Faster R-CNN

faster r-cnn

Modules
- region proposal을 위한 FCN-based의 RPN
- object detection을 위한 Fast R-CNN
Attention
- RPN의 결과와 attention을 하여 Fast R-CNN이 어느 부분을 ‘look’해야할지 알려주는 구조
sharing CNN layer
- RPN과 Fast R-CNN이 하단의 같은 CNN layer을 공유

4.1 Region Proposal Networks

rpn

RPN
- fully convolutional network로 구성
- Fast R-CNN과 공유하는 하단의 CNN layer의 output을 input으로 사용
- region proposal과 objectness score(그 안에 object가 있는지 없는지를 판단)의 리스트를 반환
Region Proposal
- RPN의 input에 $n \times n$ filter들을 sliding 시키며 각 feature map을 추출
- 해당 feature map은 각각 box classification layer와 bounding box regression layer의 input으로 사용
- box classification layer과 bounding box regression layer은 각각 $1 \times 1$ convolution을 통하여 각각 $2k$ , $4k$ 개의 feature를 추출
- $k$ = anchor box의 개수
- box classification layer는 각 anchor box마다 object가 있는지 없는지를 판단하므로 2개의 value가 필요
- bounding box regression layer는 anchor box를 조절하기 위한 (x,y,w,h) 값이 필요하므로 4개의 value가 필요
- 결과적으로, RPN의 결과로 $6k$ 개의 feature가 생성됨
- 즉, image의 각 위치를 중심으로 가지는 $k$ 개의 bounding box를 만드는 것과 같음
- 따라서, RPN의 input size가 $W \times H \times D$ 라고 한다면, $W \times H \times k$ 개의 anchor box가 생성됨

4.1.1 Anchors

anchor

기존에는 image의 크기를 조절하는 image pyramid 혹은 filter의 크기를 조절하는 filter pyramid를 사용
Faster R-CNN서는 각 window마다 미리 정해진 여러 개의 reference를 사용하는 방식을 채택
- 미리 정해진 크기와 비율의 reference를 anchor box라고 함
각 anchor box는 RPN의 filter의 중심을 중심으로 가짐
- 논문에서는 3개의 scale과 3개의 ratio를 각각 가진 총 9개의 anchor box를 사용

4.1.1.1 Translation-Invariant Anchors

Translation Invariant
- object의 크기가 변하면 proposal의 크기도 변해야 함
- anchor box는 MultiBox method와 다르게 translation invariant를 보장
anchor box는 필요한 parameter가 적기 때문에 학습시간을 줄여주는 역할도 함

4.1.1.2 Multi-Scale Anchors as Regression References

다양한 scale을 학습하는 것에는 3가지 방식 존재
- image pyramid : image의 크기를 다양하게 조절
- filter pyramid : 다양한 크기의 filter를 적용
- anchor pyramid : 정해진 anchor box의 크기와 모양으로 부터 얼마나 줄여야되는지 학습
anchor pyramid는 single-scale의 image와 같은 크기의 filter만 사용하지만, multi-scale 또한 잘 학습
- anchor box가 다양한 크기로 이미 reference되기 때문에 이미 multi-scale적인 design이라고 할 수 있음
- 즉, 다양한 크기의 image와 filter가 없어도 extra cost 없이 multi-scale을 학습
- 따라서, 다른 multi-scale 학습 방식보다 더 적은 computation이 필요

4.1.2 Loss Function

ground truth bounding box와의 IoU가 0.7 이상인 box는 positive anchor로 labeling하고, 0.3 이하인 box는 negative anchor로 labeling하여 해당 anchor box들만 학습에 사용
Multi-task Loss
- $L(p_i, t_i)$ = $\frac{\sum L_{class} (p_i, \hat{p_i})}{N_{class}} + \lambda \frac{\sum L_{reg} (t_i, \hat{t_i})}{N_{reg}}$
- $L_{class}$ 와 $L_{reg}$ 는 Fast R-CNN과 동일
- $N_{classs}$ = mini-batch size
- $N_{reg}$ = number of anchor location
기존의 모델은 서로 다른 RoI pooling의 결과에 같은 regression weight를 가졌지만, Faster R-CNN은 각각의 anchor box마다 다른 weight를 가지므로 다양한 크기의 box를 예측할 수 있음

4.1.3 Training RPNs

RPN 역시 end-to-end로 학습 가능
mini-batch는 한 image에서 256개의 anchor를 sampling하여 사용
- 이 때, positive anchor와 negative anchor를 1:1 비율로 sampling

RPN으로 생성된 proposal에 Fast R-CNN을 적용
RPN과 Fast R-CNN이 같이 shared CNN layer를 학습할 방법이 필요
- Alternating Training 사용
- 그 외에도 Approximate joint training, Non-approximate joint training 등의 기법이 존재

4.2.1 4-Step Alternating Training

RPN 학습
RPN의 proposal을 사용하여 Fast R-CNN 학습
shared CNN-layer를 고정시키고 RPN만 학습
shared CNN-layer를 고정시키고 Fast R-CNN만 학습

4.3 Implementation Details

table1

rescale
- 크기가 아주 작은 image는 rescale해줌
anchor
- 128, 256, 512 size와 1:1, 1:2, 2:1 ratio를 사용
- image를 벗어나는 anchor는 사용하지 않았기 때문에 실제로는 6000개의 anchor 정도만 사용
- objectness score를 기준으로 NMS를 수행하여 IoU가 0.7이상인 것들에 대해서만 수행
- 결과적으로 2000개 정도의 anchor만 남음

5. Experiments

5.1 Experiments on PASCAL VOC

table2-5

5.1.1 Ablation Experiments on RPN

하단의 CNN layer를 공유하는 것이 더 좋은 성능
RPN을 사용하는것이 Selective Search 보다 더 좋은 성능
NMS을 수행하는 것과 하지 않는 것은 성능에 큰 영향을 미치지 않음
bounding box regression이 box classification보다 더 중요한 역할
하단의 CNN layer에 VGG를 사용하는 것이 ZF보다 더 좋은 성능

5.1.2 Performance of VGG-16

table3-4-6-7

5.1.3 Sensitivities to Hyper-parameters

table8-9

5.1.4 Analysis of Recall-to-IoU

fig4

RPN은 적은 수의 proposal만 사용하더라도 좋은 성능을 보여줌

5.1.5 One-Stage Detection vs. Two-Stage Proposal + Detection

table10

5.2 Experiments on MS COCO

table11

5.2.1 Faster R-CNN in ILSVRC & COCO 2015 competitions

VGG-16을 ResNet-101로 바꾸었을 때 더 좋은 성능을 보여줌
즉, layer가 깊어질수록 더 좋은 feature를 추출하므로 더 좋은 성능을 보여줌

5.3 From MS COCO to PASCAL VOC

table12

5.4 Sample

fig5

6. Conclusion

CNN기반의 RPN의 도입으로 region proposal의 quality가 더 좋아졌다.

DevelopHyun

Faster R-CNN[1] Faster R CNN: Towards Real Time Object Detection with Region Proposal Networks(2015) - Review

1. Abstract

2. Introduction

4. Faster R-CNN

4.1 Region Proposal Networks

4.1.1 Anchors

4.1.1.1 Translation-Invariant Anchors

4.1.1.2 Multi-Scale Anchors as Regression References

4.1.2 Loss Function

4.1.3 Training RPNs

4.2.1 4-Step Alternating Training

4.3 Implementation Details

5. Experiments

5.1 Experiments on PASCAL VOC

5.1.1 Ablation Experiments on RPN

5.1.2 Performance of VGG-16

5.1.3 Sensitivities to Hyper-parameters

5.1.4 Analysis of Recall-to-IoU

5.1.5 One-Stage Detection vs. Two-Stage Proposal + Detection

5.2 Experiments on MS COCO

5.2.1 Faster R-CNN in ILSVRC & COCO 2015 competitions

5.3 From MS COCO to PASCAL VOC

5.4 Sample

6. Conclusion

7. Reference

Related Posts

DevelopHyun

1. Abstract

2. Introduction

3. Related Work

4. Faster R-CNN

4.1 Region Proposal Networks

4.1.1 Anchors

4.1.1.1 Translation-Invariant Anchors

4.1.1.2 Multi-Scale Anchors as Regression References

4.1.2 Loss Function

4.1.3 Training RPNs

4.2 Sharing Features for RPN and Fast R-CNN

4.2.1 4-Step Alternating Training

4.3 Implementation Details

5. Experiments

5.1 Experiments on PASCAL VOC

5.1.1 Ablation Experiments on RPN

5.1.2 Performance of VGG-16

5.1.3 Sensitivities to Hyper-parameters

5.1.4 Analysis of Recall-to-IoU

5.1.5 One-Stage Detection vs. Two-Stage Proposal + Detection

5.2 Experiments on MS COCO

5.2.1 Faster R-CNN in ILSVRC & COCO 2015 competitions

5.3 From MS COCO to PASCAL VOC

5.4 Sample

6. Conclusion

7. Reference

Related Posts