논문 제목 :

이 논문은 OOM(Out Of Memory)가 많이 나는 DL Models의 문제들 때문에 연구되었다.

아래 그림은 Deep Learning model을 만들기 위해 꼭 필요한 memory들이다.
Weight Tensor
- Category의 이름이 그대로 설명해주고 있다.
In/Out Tensor
- 처음 input을 Memory에 올린 initial input
- 중간에 output으로 생성된(다음 연산의 input) Operator Input
- Forward propagation의 결과인 Forward Output
- Back Prop시에 생성되는 Output Gradient. 이는 연산 중간 생성된 memory이다.
Ephemeral Tensor(수명이 짧은, 단명의)
- cuDNN Workspace
  - conv 연산을 할때,cuDNN은 특수한 알고리즘으로 구성되어 있음. 그때 여분의 memory가 필요함. 그 때 사용함.(memory를 쓰는 대신, 속도를 빠르게 가져갔다.)
- Temporary Tensor는 예를들어 padding이 있다.
Resident Buffer - 4.5장
- CUDA Context
  - managing information to control + use GPU devices????
  - GPU SKU(Stock Keeping Unit), deep learning framework 딱 이 두가지에 의해 결정됨. 모델이 바껴도 변하지 않음.
- Internal Tensor Fragmentation
  - tensor의 element간에 alignment를 맞추는(순서관계를 가지고 있는?) 여분의 memory
- Allocator Reservation
  - released yet unreclaimed tensors / pre-allocated memory / external tensor fragmentation이 있다고 하는데 잘 모르겠다.

$$
% \text{error}=\frac{|\text{Est.}-\text{Real}|}{\text{Real}} \times 100
$$

SI(Shape inference) - 자세한건 다른 논문 보라고 되어있음.
- 중간에 계산된 Shape만 가지고 memory량을 측정
- batch size, input tensor shape, the filter number,
  and so on과 같은 값을 Output에 만들어주고 계산함. 마치 Table3계산하듯이...

아래 그림은 논문(Microsoft)에서 제시하는 새로운 Deep Learning API(architecture)인 DNNMem에 대한 구조이다. memory적으로 매우 효율적이다. 줄인 방법은 자세히 보지 않았다. 논문 참조.

얀 르쿤 페이스북 요약. Self-supervised learning: NLP vs VISION (0)	2021.03.08
NeRF-Neural Radiance Field (0)	2021.03.06
Meta Pseudo Label (0)	2021.03.06
FixMatch (0)	2021.03.06
betaVAE (0)	2021.03.06