2021. 3. 23. 14:32ㆍ딥러닝
Paper Arxiv 링크
저자 github
Contribution
- simultaneously clusters the data
- "swapped" prediction
- memory efficient
- multi-crop
- Loss
$$L(\boldsymbol{z}_{t},\boldsymbol{z}_{s})=\mathit{l}(\boldsymbol{z}_{t},\boldsymbol{q}_{s})+\mathit{l}(\boldsymbol{z}_{s},\boldsymbol{q}_{t})$$
Online clustering
$\boldsymbol{z}_{t}$가 $C$를 지남. (fc layer랑 하는 역할이 비슷, but 훈련이 $f_{\theta}$와 다르게 됨.)
그다음 $\boldsymbol{q}_{s}$를 예측함.
$$\mathit{l}(\boldsymbol{z}_{t},\boldsymbol{q}_{s})=-\sum_{k}\boldsymbol{q}_{s}^{(k)}\log\boldsymbol{p}_{t}^{(k)},\ \text{where }\boldsymbol{p}_{t}^{(k)}=\frac{\exp\left(\frac{1}{\tau}\boldsymbol{z}_{t}^{T}\boldsymbol{c}_{k}\right)}{\sum_{k'}\exp\left(\frac{1}{\tau}\boldsymbol{z}_{t}^{T}\boldsymbol{c}_{k'}\right)}.$$
전체 식:
$$-\frac{1}{N}\sum_{n=1}^{N}\sum_{s,t\sim\mathscr{T}}\left[\frac{1}{\tau}\boldsymbol{z}_{nt}^{T}\boldsymbol{Cq}_{ns}+\frac{1}{\tau}\boldsymbol{z}_{ns}^{T}\boldsymbol{Cq}_{nt}-\log\sum_{k=1}^{K}\exp\left(\frac{\boldsymbol{z}_{nt}^{T}\boldsymbol{c}_{k}}{\tau}\right)-\log\sum_{k=1}^{K}\exp\left(\frac{\boldsymbol{z}_{ns}^{T}\boldsymbol{c}_{k}}{\tau}\right)\right].$$
여기서 $\boldsymbol{C}$와 $\theta$가 jointly learning됨.
Computing codes online : $\mathit{Q}$($=\boldsymbol{q}_{s}$) 계산하기.
목적 : $\boldsymbol{z}_{t}\boldsymbol{C}$를 equally partition하기. (Batch안에서)
논문 Asano et al. [2]를 보면
$$\max_{\boldsymbol{Q}\in\mathit{Q}}\text{Tr}\left(\boldsymbol{Q}^{T}\boldsymbol{C}^{T}\boldsymbol{Z}\right)+\epsilon H\left(\boldsymbol{Q}\right),$$
$$\mathit{Q}=\left\{ \boldsymbol{Q}\in\mathbb{R}_{+}^{K\times B}|\boldsymbol{Q}\boldsymbol{1}_{B}=\frac{1}{K}\boldsymbol{1}_{K},\boldsymbol{Q}^{T}\boldsymbol{1}_{K}=\frac{1}{B}\boldsymbol{1}_{B}\right\} ,$$
위 두식을 만족하는 $\boldsymbol{Q}\in\mathit{Q}$를 찾으면 됨. (hard code)
그러나 이게 안좋음. 따라서 Soft code 를 제안함.[13]
$$\boldsymbol{Q}^{*}=\text{Diag}\left(\boldsymbol{u}\right)\exp\left(\frac{\boldsymbol{C}^{T}\boldsymbol{Z}}{\epsilon}\right)\text{Diag}\left(\boldsymbol{v}\right),$$
where $\boldsymbol{u},\boldsymbol{v}\in\mathbb{R}^{K},\mathbb{R}^{B}$ each.
위 식을 푸는 방법 : Sinkhorn-Knopp algorithm [13]을 사용함.
Multi-crop
$\boldsymbol{z}_{t}$의 view들을 여러개 사용함.
$$L(\boldsymbol{z}_{t_{1}},\boldsymbol{z}_{t_{2}},\cdots,\boldsymbol{z}_{t_{V+2}})=\sum_{i\in\{1,2\}}\sum_{v=1}^{V+2}\boldsymbol{1}_{v\neq i}\mathit{l}(\boldsymbol{z}_{t_{v}},\boldsymbol{q}_{t_{i}}).$$
2개 standard resolution crops, V개 additional low resolution crops that cover only samll parts(모르겠음).
SimCLR보다 나은 점
- 모든 이미지들을 individual 즉 개개인으로 보고 다른 이미지는 멀리함.
- 이 페이퍼는 clustering을 통해 유사한 이미지는 group화함. group을 discriminate함(구분지음)
- SimCLR은 randomCrop하지만
- 이 페이퍼는 Multi-crop을 함.
'딥러닝' 카테고리의 다른 글
Aleatoric and Epistemic Uncertainty - Alex Kendall (0) | 2021.03.31 |
---|---|
SwAV 코드 살펴보기 (0) | 2021.03.23 |
Fine-grained 서베이 논문 3편 (0) | 2021.03.16 |
얀 르쿤 페이스북 요약. Self-supervised learning: NLP vs VISION (0) | 2021.03.08 |
NeRF-Neural Radiance Field (0) | 2021.03.06 |