SwAV, SEER-Unsupervised Learning by Contrasting Cluster Assignments

2021. 3. 23. 14:32딥러닝

Paper Arxiv 링크

저자 github

SwAV Illustration

Contribution

  1. simultaneously clusters the data
  2. "swapped" prediction
  3. memory efficient
  4. multi-crop
  • Loss

$$L(\boldsymbol{z}_{t},\boldsymbol{z}_{s})=\mathit{l}(\boldsymbol{z}_{t},\boldsymbol{q}_{s})+\mathit{l}(\boldsymbol{z}_{s},\boldsymbol{q}_{t})$$

Online clustering

figure 1

$\boldsymbol{z}_{t}$가 $C$를 지남. (fc layer랑 하는 역할이 비슷, but 훈련이 $f_{\theta}$와 다르게 됨.)

그다음 $\boldsymbol{q}_{s}$를 예측함.

$$\mathit{l}(\boldsymbol{z}_{t},\boldsymbol{q}_{s})=-\sum_{k}\boldsymbol{q}_{s}^{(k)}\log\boldsymbol{p}_{t}^{(k)},\ \text{where }\boldsymbol{p}_{t}^{(k)}=\frac{\exp\left(\frac{1}{\tau}\boldsymbol{z}_{t}^{T}\boldsymbol{c}_{k}\right)}{\sum_{k'}\exp\left(\frac{1}{\tau}\boldsymbol{z}_{t}^{T}\boldsymbol{c}_{k'}\right)}.$$

 

 

전체 식:

$$-\frac{1}{N}\sum_{n=1}^{N}\sum_{s,t\sim\mathscr{T}}\left[\frac{1}{\tau}\boldsymbol{z}_{nt}^{T}\boldsymbol{Cq}_{ns}+\frac{1}{\tau}\boldsymbol{z}_{ns}^{T}\boldsymbol{Cq}_{nt}-\log\sum_{k=1}^{K}\exp\left(\frac{\boldsymbol{z}_{nt}^{T}\boldsymbol{c}_{k}}{\tau}\right)-\log\sum_{k=1}^{K}\exp\left(\frac{\boldsymbol{z}_{ns}^{T}\boldsymbol{c}_{k}}{\tau}\right)\right].$$

 

여기서 $\boldsymbol{C}$와 $\theta$가 jointly learning됨.

Computing codes online : $\mathit{Q}$($=\boldsymbol{q}_{s}$) 계산하기.

목적 : $\boldsymbol{z}_{t}\boldsymbol{C}$를 equally partition하기. (Batch안에서)

논문 Asano et al. [2]를 보면

$$\max_{\boldsymbol{Q}\in\mathit{Q}}\text{Tr}\left(\boldsymbol{Q}^{T}\boldsymbol{C}^{T}\boldsymbol{Z}\right)+\epsilon H\left(\boldsymbol{Q}\right),$$

 

$$\mathit{Q}=\left\{ \boldsymbol{Q}\in\mathbb{R}_{+}^{K\times B}|\boldsymbol{Q}\boldsymbol{1}_{B}=\frac{1}{K}\boldsymbol{1}_{K},\boldsymbol{Q}^{T}\boldsymbol{1}_{K}=\frac{1}{B}\boldsymbol{1}_{B}\right\} ,$$

 

위 두식을 만족하는 $\boldsymbol{Q}\in\mathit{Q}$를 찾으면 됨. (hard code)

그러나 이게 안좋음. 따라서 Soft code 를 제안함.[13]

$$\boldsymbol{Q}^{*}=\text{Diag}\left(\boldsymbol{u}\right)\exp\left(\frac{\boldsymbol{C}^{T}\boldsymbol{Z}}{\epsilon}\right)\text{Diag}\left(\boldsymbol{v}\right),$$

where $\boldsymbol{u},\boldsymbol{v}\in\mathbb{R}^{K},\mathbb{R}^{B}$ each.

위 식을 푸는 방법 : Sinkhorn-Knopp algorithm [13]을 사용함.

Multi-crop

$\boldsymbol{z}_{t}$의 view들을 여러개 사용함. 

 

$$L(\boldsymbol{z}_{t_{1}},\boldsymbol{z}_{t_{2}},\cdots,\boldsymbol{z}_{t_{V+2}})=\sum_{i\in\{1,2\}}\sum_{v=1}^{V+2}\boldsymbol{1}_{v\neq i}\mathit{l}(\boldsymbol{z}_{t_{v}},\boldsymbol{q}_{t_{i}}).$$

 

2개 standard resolution crops, V개 additional low resolution crops that cover only samll parts(모르겠음).

 

SimCLR보다 나은 점

  1. 모든 이미지들을 individual 즉 개개인으로 보고 다른 이미지는 멀리함.
    1. 이 페이퍼는 clustering을 통해 유사한 이미지는 group화함. group을 discriminate함(구분지음)
  2. SimCLR은 randomCrop하지만
    1. 이 페이퍼는 Multi-crop을 함.

 

donaricano-btn