# 2016-ICLR-Density Modeling of Images using a Generalized Normalization Transformation

## 2. 引言

PCA(Principal Component Analysis) Jolliffe, 2002
">[2]
ICA(Independent Component Analysis) Cardoso, 2003
">[3]
RG(Radial Gaussianization) Lyu & Simoncelli, 2009b
">[4]
Sinz & Bethge, 2010
">[5]

## 3. 数据高斯化

$p_{\boldsymbol{x}}(\boldsymbol{x}) = \left\lvert \frac{\partial g(\boldsymbol{x}; \boldsymbol{\theta})}{\partial \boldsymbol{x}}\right\rvert \ p_{\boldsymbol{y}}(g(\boldsymbol{x}; \boldsymbol{\theta})) \tag{1}$

\begin{align*} J(p_{\boldsymbol{y}}) & = \mathbb{E}_{\boldsymbol{y}} (\log{p_{\boldsymbol{y}}}(\boldsymbol{y}) - \log{\mathcal{N}(\boldsymbol{y})}) \\ & = \mathbb{E}_{\boldsymbol{x}} \left(\log{p_{\boldsymbol{x}}}(\boldsymbol{x}) - \log{\left\lvert \frac{\partial g(\boldsymbol{x};\boldsymbol{\theta})}{\partial \boldsymbol{x}} \right\rvert} - \log{\mathcal{N}(g(\boldsymbol{x}; \boldsymbol{\theta}))}\right) \tag{2} \end{align*}

$\boldsymbol{\theta}$ 求导后，得

\begin{align*} \frac{\partial J(p_{\boldsymbol{y}})}{\partial \boldsymbol{\theta}} = \mathbb{E}_{\boldsymbol{x}} \left( -\sum_{ij} \left[ \frac{\partial g(\boldsymbol{x}; \boldsymbol{\theta})}{\partial \boldsymbol{x}} \right]_{ij}^{-T} \frac{\partial^2 g_i(\boldsymbol{x}; \boldsymbol{\theta})}{\partial x_j \partial \boldsymbol{\theta}} + \sum_{i} g_i(\boldsymbol{x}; \boldsymbol{\theta}) \frac{\partial g_i(\boldsymbol{x}; \boldsymbol{\theta})}{\partial \boldsymbol{\theta}} \right) \tag{3} \end{align*}

$\Delta J \equiv J(p_{\boldsymbol{y}}) - J(p_{\boldsymbol{x}}) = \mathbb{E}_{\boldsymbol{x}} \left( \frac{1}{2} \lVert \boldsymbol{y} \rVert_2^2 - \log{\left\lvert \frac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}} \right\rvert - \frac{1}{2} \lVert \boldsymbol{x} \rVert_2^2} \right) \tag{4}$

$(4)$ 给出了变换后的数据 $\boldsymbol{y}$ 相对于变换前的数据 $\boldsymbol{x}$ 高斯化的程度。

## 4. 分裂归一化

$y_i = \gamma \frac{x_i^\alpha}{\beta^\alpha + \sum_j x_j^\alpha} \tag{5}$

\begin{align*} \boldsymbol{y} = g(\boldsymbol{x};\boldsymbol{\theta}) && \text{ s.t. } &&& y_i = \frac{z_i}{(\beta_i + \sum_j \gamma_{ij} |z_j|^{\alpha_{ij}})^{\varepsilon_i}}\\ && \text{ and } &&& \boldsymbol{z} = \boldsymbol{H} \boldsymbol{x} \tag{6} \end{align*}

$\frac{\partial y_i}{\partial z_k} = \frac{\delta_{ik}}{(\beta_i + \sum_j \gamma_{ij} |z_j|^{\alpha_{ij}})^{\varepsilon_i}} - \frac{\alpha_{ik} \gamma_{ik} \varepsilon_i z_i |z_k|^{\alpha_{ik} - 1} \mathrm{sgn}(z_k)}{(\beta_i + \sum_j \gamma_{ij} |z_{j}|^{\alpha_{ij}})^{\varepsilon_i + 1}} \tag{7}$

$\frac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}} = \frac{\partial \boldsymbol{z}}{\partial \boldsymbol{x}} \cdot \frac{\partial \boldsymbol{y}}{\partial \boldsymbol{z}} \tag{8}$

$J_{\boldsymbol{y} \rightarrow \boldsymbol{x}} = J_{\boldsymbol{z} \rightarrow \boldsymbol{x}} \cdot J_{\boldsymbol{y} \rightarrow \boldsymbol{z}}$，其中 $J_{\boldsymbol{y} \rightarrow \boldsymbol{x}}$ 表示 $\boldsymbol{y}$$\boldsymbol{x}$ 的雅可比矩阵，其它的以此类推。要保证 $J_{\boldsymbol{y} \rightarrow \boldsymbol{x}}$ 可逆，也即非奇异，则要求雅可比行列式 $|J_{\boldsymbol{y} \rightarrow \boldsymbol{x}}| \neq 0$。根据行列式的性质，即要求 $|J_{\boldsymbol{z} \rightarrow \boldsymbol{x}}| \cdot |J_{\boldsymbol{y} \rightarrow \boldsymbol{z}}| \neq 0$，从而要求 $|J_{\boldsymbol{z} \rightarrow \boldsymbol{x}}| \neq 0$$|J_{\boldsymbol{y} \rightarrow \boldsymbol{z}}| \neq 0$

• 对于 $|J_{\boldsymbol{z} \rightarrow \boldsymbol{x}}| \neq 0$，即 $|H| \neq 0$，也即要求参数矩阵 $H$ 非奇异即可；

• 对于 $|J_{\boldsymbol{y} \rightarrow \boldsymbol{z}}| \neq 0$，作者给出了一个充分条件，即让 $J_{\boldsymbol{y} \rightarrow \boldsymbol{z}}$ 正定，这个最终是通过在初始化参数时保证的。此外，为了方便求解式 $(6)$ 变换的逆，作者要求单变量映射 $y_i \rightarrow z_i$ 是可逆的，根据式 $(6)$ 有：

$|y_i| = \frac{|z_i|}{(\beta_i + \sum_{j} \gamma_{ij} |z_j|^{\alpha_{ij}})^{\varepsilon_i}} \leq \frac{|z_i|}{\gamma_{ii}^{\varepsilon_i} |z_i|^{\alpha_{ii} \varepsilon_i}} = \gamma_{ii}^{-\varepsilon_i} |z_i|^{1-\alpha_{ii} \varepsilon_i} \tag{9}$

由于单变量映射 $y_i \rightarrow z_i$ 连续可逆，因此 $y_i$ 关于 $z_i$ 一定得是单调的，从而要求 $1 - \alpha_{ii} \varepsilon_i \geq 0$，即 $\varepsilon_i \leq \alpha_{ii}^{-1}$

$\frac{\partial y_i}{\partial z_k} = \begin{cases} 0 & i \neq k \\ \frac{\beta_i + \gamma_{ii}|z_i|^{\alpha_{ii}}(1 - \alpha_{ii} \varepsilon_i \mathrm{sgn}(z_i))}{(\beta_i + \gamma_{ii} |z_i|^{\alpha_{ii}})^{\varepsilon_i + 1}} & i = k \end{cases}$

$z_i^{(0)} = \mathrm{sgn}(y_i) (\gamma_{ii}^{\varepsilon_i} |y_i|)^{\frac{1}{1 - \alpha_{ii}\varepsilon_i}} \\ z_i^{(n+1)} = \left( \beta_i + \sum_j \gamma_{ij} |z_j^{(n)}|^{\alpha_{ij}} \right)^{\varepsilon_i} y_i \tag{10}$

## 5. 实验

### 5.1 小波系数对

\begin{align*} I(y_1, y_2) & = \mathrm{DL}(p(y_1, y_2) \parallel p(y_1) \otimes p(y_2)) \\ & = \mathbb{E}_{\boldsymbol{y}} \log{\left( \frac{p_{\boldsymbol{y}}(\boldsymbol{y})}{p_{y_1}(y_1) p_{y_2}(y_2)} \right)} \\ & = \mathbb{E}_{\boldsymbol{y}} \log{p_{\boldsymbol{y}}(\boldsymbol{y})} - \mathbb{E}_{y_1} \log{p_{y_1}(y_1)} - \mathbb{E}_{y_2} \log{p_{y_2}(y_2)} \end{align*}

### 5.2 图像块

#### 5.2.4 去噪

$\hat{\boldsymbol{x}} = \tilde{\boldsymbol{x}} + \sigma^2 \nabla \log{p_{\tilde{\boldsymbol{x}}}} (\tilde{\boldsymbol{x}})$

## 附录

