1. Abstract

resNet이 왜 성능이 좋은지에 대한 논문
resNet은 서로 다른 길이의 path들의 collection이다.
- path끼리의 연관성이 적다
- 여러 길이의 path들은 ensemble과 같은 역할을 한다.
실제로 모델에 많은 영향을 미치는 것은 짧은 path
- 만약 110-layers model이라면, 실제로는 10-34 layers 깊이의 path들이 핵심

2. Introduction

resNet
- shortcut path는 데이터를 바로 다른 레이어로 보낼 수 있다.
- resNet을 통하여 더 깊게 모델을 만들 수 있다.
- test시에 layer를 빼도 크게 영향이 없다.(VGG의 경우, 하나만 빼도 치명적)
path의 길이는 binomial distribution을 따른다.
- n개의 layer라면, n/2-layer를 지나는 path가 가장 많음
- 실제로 gradient에 영향을 미치는 path는 n/2보다 짧은 path
resNet은 모델이 깊어지며 발생하는 vanishing gradient 문제를 해결해서 성능이 좋은 것이 아니라, 깊어질수록 effective path를 많이 만들어 ensemble과 같은 역할을 해서 성능이 좋은 것

2.1 Unraveled View

unraveled View

network를 펼쳐놓은 형태

The sequential and hierarchical computer vision pipeline
Residual networks
Highway networks
- ${y_{i+1}}$ = ${f_{i+1}}({y_{i}}) \cdot {t_{i+1}}({y_{i}}) + {y_{i}} \cdot (1-{t_{i+1}}({y_{i}}))$
- residual network의 일반형
- main path와 shortcut path에 곱해질 가중치값을 학습
Investigating neural networks
Ensembling
Dropout

4. The unraveled view of residual networks

unraveled view

n-layers network의 총 path 개수는 $2^{n}$
unraveled view를 식으로 표현하면 아래와 같아진다.
- 중간의 layer가 사라져도 output값을 뽑아낼 수 있다.
- 서로 다른 path는 다른 path에게 영향을 별로 주지 않는다.
- 여러 개의 path가 더해져 output을 만들어낸다. = ensemble
VGG의 경우 식으로 표현하면 아래와 같아진다.
- 중간의 layer가 사라지면 output값 자체를 뽑아낼 수 없다.

5. Lesion study(손상연구)

layer를 지워가며 모델에 어떤 영향을 미치는지 확인
- path끼리 의존성이 있는지?
- ensemble로 작용하는지?

5.1 Deleting individual layers from neural networks at test time

VGG의 경우 layer를 하나만 줄여도 error가 치명적으로 올라갔지만, resNet은 그렇지 않음
- 한 개의 path를 지워도 절반의 path가 남아있기 때문
resNet의 경우, downsampling block를 없앴을 경우에만 error가 올라감
path끼리 서로 의존적이지 않다는 것을 의미

5.2 Deleting many modules from residual networks at test-time

만약 ensemble이라면 member의 수에 따라서 성능이 달라짐
residual module을 delete할 때마다 에러가 올라감
- valid path가 줄어드는 것과 연관

5.3 Reordering modules in residual networks at test-time

layer 순서를 많이 permutate하면 에러가 올라감

6. The importance of short paths in residual networks

layer의 길이마다 모델에 미치는 영향을 연구
- path의 길이에 따라 미치는 영향이 다른지?

Distribution of path lengths
- n-layers 라고 한다면, 대부분이 n/2개 주변에 몰여있음
Vanishing gradients in residual networks
- 길이에 따라 gradient에 영향을 미치는 정도가 다르다.(짧은 것이 많은 영향)
The effective paths in residual networks are relatively shallow
- 영향을 많이 미치는 path가 effective path
- effective path는 상대적으로 얕은 path가 대부분이다.
- effective path들만으로 실험을 해도 성능에 큰 차이가 없다.(6.10% ->5.96%)
- 전체의 0.45%지만 모델의 학습에 큰 영향을 미친다.

7. Discussion

module을 몇 개 지우는 것은 주로 긴 path가 지워지는 것과 같다.
- n-layers에서 2개를 지우면 길이가 n, n-1인 path는 각각 1과 n개에서 0개가 된다.
- 길이가 n-2인 path는 n(n-1)/2개에서 1개가 된다.
- 즉, 길이가 긴 path는 비율이 거의 100%에 가깝게 확 줄어버린다.
- 반면에 중간 길이의 path는 $\binom{n}{n/2}$ 에서 $\binom{n-2}{n/2}$ 가 되므로 비교적 적게 사라진다.
- 요약하자면, fraction of remaing path of length x = $\frac{\binom{n-d}{x}}{\binom{n}{x}}$
highway network
- shortcut connection 쪽으로 $t$ 값이 치우친다.
stochastic depth training
- 모델을 training할 때, 전부 학습 시키는 것이 아니라 sampling한 것만 학습시키는 것
- 학습을 할 때, 짧은 path만 학습시키는 효과

8. Conclusion

“Thus, residual networks do not resolve the vanishing gradient problem by preserving gradient flow throughout the entire depth of the network.”

9. Reference

https://arxiv.org/abs/1605.06431
Ybigta deepNLP-study

DevelopHyun

resNet[3] Residual Networks Behave Like Ensembles of Relatively Shallow Networks(2016) - Review

1. Abstract

2. Introduction

2.1 Unraveled View

4. The unraveled view of residual networks

5. Lesion study(손상연구)

5.1 Deleting individual layers from neural networks at test time

5.2 Deleting many modules from residual networks at test-time

5.3 Reordering modules in residual networks at test-time

6. The importance of short paths in residual networks

7. Discussion

8. Conclusion

9. Reference

Related Posts

DevelopHyun

1. Abstract

2. Introduction

2.1 Unraveled View

3. Related Work

4. The unraveled view of residual networks

5. Lesion study(손상연구)

5.1 Deleting individual layers from neural networks at test time

5.2 Deleting many modules from residual networks at test-time

5.3 Reordering modules in residual networks at test-time

6. The importance of short paths in residual networks

7. Discussion

8. Conclusion

9. Reference

Related Posts