DevelopHyun

Data Science & Algorith with Computer Science

resNet[2] Identity Mappings in Deep Residual Networks(2016) - Review

10 Feb 2018 » deeplearning, cnn, resnet, paperreview

1. Abstract

residual network가 왜 효과적인지, 어떻게 하면 더 개선할 수 있는지에 관한 논문
residual block을 사용하면 vanishing gradient 문제가 해결됨을 수식으로 증명
- 오차역을 어느 block으로든 잘 전달할 수 있게됨

2. Introduction

proposed residual unit

general form of residual unit
- ${y_{l}}$ = $h({x_{l}}) + F({x_{l},{W_{l}}})$
- ${x_{l+1}}$ = $f({y_{l}})$
- $h$ = identity mapping, $f$ = ReLU, $F$ = residual function
shortcut path에는 아무것도 해주지 않는 것이 제일 효과적(proposed residual unit)
- f를 identity mapping으로 만들어주기 위하여, ‘pre-activataion’ 방식으로 weight layer 재구성

3. Analysis of Deep Residual Network

proposed residual unit의 경우 = identity mapping을 의미
- ${x_{l+1}}$ = $f({y_{l}})$ = ${y_{l}}$
일 경우에 모든 은 로 표현 가능함
- 따라서, chain rule 을 통하여 back propagation을 살펴 볼 수 있음
- ${x_{l+1}}$ = ${x_{l}} + F({x_{l},{W_{l}}})$
- $x_{L}$ = $x_{l} + \sum_{i=l}^{L-1}F(x_{i}, W_{i})$
- $cost$ = $\varepsilon$
- 위의 식을 살펴보면, $W$ 가 아무리 작아도 $L$ 에서의 gradient가 $l$ 로 전달됨
- vanishing gradient가 발생하지 않음

4. On the Importance of Identity Skip Connections

shortcut connection에 scalar를 곱해주는 경우
- $h(x) = \lambda * x$ 가 됨
- $\lambda > 0$ 인 경우에는 값이 너무 커지고, $\lambda < 0$ 값이 너무 작아짐
- shortcut connection은 건드리지 않는 것이 좋음
다른 여러가지 architecture 시도

4.1 Experiments on skip connection

shortcut path는 clear하게 구성하는 것이 좋음

5. On the Usage of Activation Functions

activation function과 normalization, convolution의 위치를 바꿔가며 실험
activation을 취해주지 않아야 제대로 propagated 됨
BN과 ReLU를 함께 사용하는 것이 성능이 좋음
- training error는 기존 모델보다 높지만, 일반화가 잘 되어서 test error는 낮음

5.1 Experiments on Activation

activation 및 normalization의 경우도 convolution layer에서만 처리해주는 것이 좋음

6. Reference

Related Posts

Video Style Transfer[1] Artistic style transfer for videos(2016) - Review (Categories: deeplearning, video, style-transfer, paperreview)
FPN[1] Feature Pyramid Networks for Object Detection(2016) - Review (Categories: deeplearning, cnn, image-detection, fpn, paperreview)
R-FCN[2] R-FCN++: Towards Accurate Region Based Fully Convolutional Networks for Object Detection(2018) - Review (Categories: deeplearning, cnn, image-detection, r-fcn, paperreview)
R-FCN[1] R-FCN: Object Detection via Region based Fully Convolutional Networks(2016) - Review (Categories: deeplearning, cnn, image-detection, r-fcn, paperreview)
Light-Head R-CNN[1] Light-Head R-CNN: In Defense of Two Stage Object Detector(2017) - Review (Categories: deeplearning, cnn, image-detection, r-cnn, paperreview)
DSSD[1] DSSD: Deconvolutional Single Shot Detector(2017) - Review (Categories: deeplearning, cnn, image-detection, dssd, paperreview)

« resNet[1] Deep Residual Learning for Image Recognition(2015) - Review resNet[3] Residual Networks Behave Like Ensembles of Relatively Shallow Networks(2016) - Review »