Mutual Information Neural Estimation

ABSTRACT

Estimating MI with gradient descent을 하고자 한다.

BACKGROUND

2.2 Dual representation of the KL-divergence

THE MINE

3.1 method

$D_{KL}(\mathbb P\space||\space\mathbb Q) \ge \underset{T \in \mathcal F}\sup \space\mathbb E_{\mathbb P}[T] - \log(\mathbb E_{\mathbb Q}[e^T]).$

KL-divergence의 dual representation에서 $\mathcal F$를 $\mathit T_\theta : \mathcal X \times \mathcal Z \to \mathbb F$ 의 family에서 고른다. $\mathit T_\theta$ 는 DNN parameter $\theta \in \Theta$ 로 parameterize 되는 함수이다.

그리고 이를 $statistics\space network$ 라고 칭한다.

MI의 bound를

$I(X;Z) \ge I_\Theta (X,Z),$ 로 둘 수 있다.

이때 $I_\Theta (X;Z)$는 $neural\space information\space measure$이며, 다음과 같이 정의한다.

$I_\Theta(X,Z) = \underset{\theta\in\Theta}\sup \space \mathbb E_{\mathbb P_{X\space Z}} [T_\theta] - \log(\mathbb E_{\mathbb P_X \otimes\mathbb P_Z}[e^{T_\theta}]).$

expectation은 $\mathbb P_{X\space Z}$ 와 $\mathbb P_X \otimes \mathbb P_Z$의 empirical sample로부터 구하거나 joint distribution의 batch axis을 shuffle한 sample을 통해 구한다.