1. Abstract

Inception은 regular convolution과 depthwise separable convolution은 중간 단계
Xception vs depthwise separable convolution
- depthwise separable convolution : depthwise convolution -> pointwise convolution - Inception : pointwise convolution -> nonlinear activation -> depthwise convolution
- Xception : pointwise convolution -> depthwise convolution
- depthwise convolution : filter의 depth = 1, 즉, spatial방향으로만 convolution
- pointwise convolution : filter의 width&height = 1, 즉, channel방향으로만 convolution
- 따라서, depthwise convolution은 공간 방향의 정보를 취합하는 것이고, pointwise convolution은 깊이 방향의 정보를 취합하는 것

2. Introduction

2.1 The Inception hypothesis

기존의 convolution의 filter는 3D 공간에 존재
- width, height로 구성되는 2개의 spatial dimension과 depth를 의미하는 channel dimension
- 따라서, cross-channel correlation과 spatial correlation을 동시에 연산
Inception에서는 cross-channel correlation과 spatial correlation을 분리
- 먼저 1x1 convolution을 통해 depth차원을 줄여준 후에 3D형태의 filter로 연산

2.2 The continuum between convolutions and separable convolutions

xception

xception module은 위와 같이 inception module을 변형하여 만들어짐
- 기존의 inception 모듈을 간단하게 변형
- Figure2를 보면 1x1 convolution이 각각 연산되는데, 이것을 모두 통합
- Figure3처럼 깊이에 따라서, 다른 convolution을 적용
- Figure3의 1x1 convolution의 결과물의 그림은 가로 방향이 depth를 의미
- 더 많은 convolution을 적용해주기 위하여 각 convolution의 filter depth를 줄여줌
결국, Xception module은 depthwise separable convolution은 유사
- 차이점 : depthwise convolution & pointwise convolution의 순서와 nonlinear operation의 존재유무
- Xception의 경우 각 convolution이 담당하는 depth가 줄어들었기 때문에 depthwise convolution에 가까워졌을 뿐만 아니라, 덕분에 처음에 수행되는 1x1 convolution의 역할이 더 구분됨. 즉, Inception보다 cross-channel correlation과 spatial correlation을 더 독립적으로 연산

3. Prior work

VGG Net
Incpetion
Depthwise separable convolution
Residual connection

4. The Xception architecture

xception

resNet의 shortcut path가 적용된 Xception module을 쌓은 model

5.Experimental evaluation

5.1 The JFT dataset

google의 dataset
Inception의 경우, ImageNet에 맞게 tuning이 되어있으므로, JFT dataset으로 공정하게 비교 가능

5.2 Optimization configuration

ImageNet와 JFT에 다른 configuration
Xception의 경우 Inception과 다르게 hyperparameter를 정교하게 tuning하지 않음

5.3 Regularization configuration

Weight decay
Dropout
Auxiliary loss tower
- Xception에서는 사용하지 않음

5.4 Training infrastructure

GPU를 통한 병렬처리
- synchronous gradient descent

5.5 Comparison with Inception V3

5.5.1 Classification performance

result

Xception이 더 좋은 성능을 보여줌

5.5.2 Size and speed

Xception

비슷한 parameter의 수를 가졌음에도 Xception이 더 좋은 성능

5.6 Effect of the residual connections

res

resNet을 적용해주는 것이 더 빠르게 수렴하며, 더 좋은 성능

5.7 Effect of an intermediate activation after pointwise convolutions

activation

pointwise convolution과 depthwise convolution 사이에 nonlinear activation이 없는 것이 더 좋은 성능
- Inception과 같이 깊은 depth에 convolution이 적용되면 좋은 성능을 보여주지만, Xception같이 얕은 depth에 적용되는 convolution의 경우에는 안좋은 성능을 보여줌
- non-linearity가 정보의 손실을 발생시킬 수 있음

6. Future directions

현재는 실험적으로 depthwise separable convolution이 regular convolution보다 더 좋은 성능을 보여주지만, 다른 구조가 있는지 더 연구해봐야 함

7. Conclusions

depthwise separable convolution에 더 가까운 convolution이 적용된 Xcpetion이 regular convolution에 가까운 Inception보다 더 좋은 성능을 보여준다.

DevelopHyun

Xception[1] Xception: Deep Learning with Depthwise Separable Convolutions(2017) - Review

1. Abstract

2. Introduction

2.1 The Inception hypothesis

2.2 The continuum between convolutions and separable convolutions

3. Prior work

4. The Xception architecture

5.Experimental evaluation

5.1 The JFT dataset

5.2 Optimization configuration

5.3 Regularization configuration

5.4 Training infrastructure

5.5 Comparison with Inception V3

5.5.1 Classification performance

5.5.2 Size and speed

5.6 Effect of the residual connections

5.7 Effect of an intermediate activation after pointwise convolutions

6. Future directions

7. Conclusions

8. References

Related Posts