# 2016-ICLR-Density Modeling of Images using a Generalized Normalization Transformation

## 2. 引言

PCA(Principal Component Analysis) Jolliffe, 2002<span class="hint--top hint--rounded" aria-label="Jolliffe, I. T. Principal Component Analysis. Springer, 2 edition, 2002. ISBN 978-0-387-95442-4.
">[2]
ICA(Independent Component Analysis) Cardoso, 2003<span class="hint--top hint--rounded" aria-label="Cardoso, Jean-François. Dependence, correlation and Gaussianity in independent component analysis. Journal of Machine Learning Research, 4:1177–1203, 2003. ISSN 1533-7928.
">[3]
RG(Radial Gaussianization) Lyu & Simoncelli, 2009b<span class="hint--top hint--rounded" aria-label="Lyu, Siwei and Simoncelli, Eero P. Nonlinear extraction of independent components of natural images using radial Gaussianization. Neural Computation, 21(6), 2009b. doi: 10.1162/neco. 2009.04-08-773.
">[4]
Sinz & Bethge, 2010<span class="hint--top hint--rounded" aria-label="Sinz, Fabian and Bethge, Matthias. Lp-nested symmetric distributions. Journal of Machine Learning Research, 11:3409–3451, 2010. ISSN 1533-7928.
">[5]

## 3. 数据高斯化

$p_{\boldsymbol{x}}(\boldsymbol{x}) = \left\lvert \frac{\partial g(\boldsymbol{x}; \boldsymbol{\theta})}{\partial \boldsymbol{x}}\right\rvert \ p_{\boldsymbol{y}}(g(\boldsymbol{x}; \boldsymbol{\theta})) \tag{1}$

\begin{align*} J(p_{\boldsymbol{y}}) & = \mathbb{E}_{\boldsymbol{y}} (\log{p_{\boldsymbol{y}}}(\boldsymbol{y}) - \log{\mathcal{N}(\boldsymbol{y})}) \\ & = \mathbb{E}_{\boldsymbol{x}} \left(\log{p_{\boldsymbol{x}}}(\boldsymbol{x}) - \log{\left\lvert \frac{\partial g(\boldsymbol{x};\boldsymbol{\theta})}{\partial \boldsymbol{x}} \right\rvert} - \log{\mathcal{N}(g(\boldsymbol{x}; \boldsymbol{\theta}))}\right) \tag{2} \end{align*}

$\boldsymbol{\theta}$ 求导后，得

\begin{align*} \frac{\partial J(p_{\boldsymbol{y}})}{\partial \boldsymbol{\theta}} = \mathbb{E}_{\boldsymbol{x}} \left( -\sum_{ij} \left[ \frac{\partial g(\boldsymbol{x}; \boldsymbol{\theta})}{\partial \boldsymbol{x}} \right]_{ij}^{-T} \frac{\partial^2 g_i(\boldsymbol{x}; \boldsymbol{\theta})}{\partial x_j \partial \boldsymbol{\theta}} + \sum_{i} g_i(\boldsymbol{x}; \boldsymbol{\theta}) \frac{\partial g_i(\boldsymbol{x}; \boldsymbol{\theta})}{\partial \boldsymbol{\theta}} \right) \tag{3} \end{align*}

$\Delta J \equiv J(p_{\boldsymbol{y}}) - J(p_{\boldsymbol{x}}) = \mathbb{E}_{\boldsymbol{x}} \left( \frac{1}{2} \lVert \boldsymbol{y} \rVert_2^2 - \log{\left\lvert \frac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}} \right\rvert - \frac{1}{2} \lVert \boldsymbol{x} \rVert_2^2} \right) \tag{4}$

$(4)$ 给出了变换后的数据 $\boldsymbol{y}$ 相对于变换前的数据 $\boldsymbol{x}$ 高斯化的程度。

## 4. 分裂归一化

$y_i = \gamma \frac{x_i^\alpha}{\beta^\alpha + \sum_j x_j^\alpha} \tag{5}$

\begin{align*} \boldsymbol{y} = g(\boldsymbol{x};\boldsymbol{\theta}) && \text{ s.t. } &&& y_i = \frac{z_i}{(\beta_i + \sum_j \gamma_{ij} |z_j|^{\alpha_{ij}})^{\varepsilon_i}}\\ && \text{ and } &&& \boldsymbol{z} = \boldsymbol{H} \boldsymbol{x} \tag{6} \end{align*}

$\frac{\partial y_i}{\partial z_k} = \frac{\delta_{ik}}{(\beta_i + \sum_j \gamma_{ij} |z_j|^{\alpha_{ij}})^{\varepsilon_i}} - \frac{\alpha_{ik} \gamma_{ik} \varepsilon_i z_i |z_k|^{\alpha_{ik} - 1} \mathrm{sgn}(z_k)}{(\beta_i + \sum_j \gamma_{ij} |z_{j}|^{\alpha_{ij}})^{\varepsilon_i + 1}} \tag{7}$

$\frac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}} = \frac{\partial \boldsymbol{z}}{\partial \boldsymbol{x}} \cdot \frac{\partial \boldsymbol{y}}{\partial \boldsymbol{z}} \tag{8}$

$J_{\boldsymbol{y} \rightarrow \boldsymbol{x}} = J_{\boldsymbol{z} \rightarrow \boldsymbol{x}} \cdot J_{\boldsymbol{y} \rightarrow \boldsymbol{z}}$，其中 $J_{\boldsymbol{y} \rightarrow \boldsymbol{x}}$ 表示 $\boldsymbol{y}$$\boldsymbol{x}$ 的雅可比矩阵，其它的以此类推。要保证 $J_{\boldsymbol{y} \rightarrow \boldsymbol{x}}$ 可逆，也即非奇异，则要求雅可比行列式 $|J_{\boldsymbol{y} \rightarrow \boldsymbol{x}}| \neq 0$。根据行列式的性质，即要求 $|J_{\boldsymbol{z} \rightarrow \boldsymbol{x}}| \cdot |J_{\boldsymbol{y} \rightarrow \boldsymbol{z}}| \neq 0$，从而要求 $|J_{\boldsymbol{z} \rightarrow \boldsymbol{x}}| \neq 0$$|J_{\boldsymbol{y} \rightarrow \boldsymbol{z}}| \neq 0$

• 对于 $|J_{\boldsymbol{z} \rightarrow \boldsymbol{x}}| \neq 0$，即 $|H| \neq 0$，也即要求参数矩阵 $H$ 非奇异即可；

• 对于 $|J_{\boldsymbol{y} \rightarrow \boldsymbol{z}}| \neq 0$，作者给出了一个充分条件，即让 $J_{\boldsymbol{y} \rightarrow \boldsymbol{z}}$ 正定，这个最终是通过在初始化参数时保证的。此外，为了方便求解式 $(6)$ 变换的逆，作者要求单变量映射 $y_i \rightarrow z_i$ 是可逆的，根据式 $(6)$ 有：

$|y_i| = \frac{|z_i|}{(\beta_i + \sum_{j} \gamma_{ij} |z_j|^{\alpha_{ij}})^{\varepsilon_i}} \leq \frac{|z_i|}{\gamma_{ii}^{\varepsilon_i} |z_i|^{\alpha_{ii} \varepsilon_i}} = \gamma_{ii}^{-\varepsilon_i} |z_i|^{1-\alpha_{ii} \varepsilon_i} \tag{9}$

由于单变量映射 $y_i \rightarrow z_i$ 连续可逆，因此 $y_i$ 关于 $z_i$ 一定得是单调的，从而要求 $1 - \alpha_{ii} \varepsilon_i \geq 0$，即 $\varepsilon_i \leq \alpha_{ii}^{-1}$

$\frac{\partial y_i}{\partial z_k} = \begin{cases} 0 & i \neq k \\ \frac{\beta_i + \gamma_{ii}|z_i|^{\alpha_{ii}}(1 - \alpha_{ii} \varepsilon_i \mathrm{sgn}(z_i))}{(\beta_i + \gamma_{ii} |z_i|^{\alpha_{ii}})^{\varepsilon_i + 1}} & i = k \end{cases}$

$z_i^{(0)} = \mathrm{sgn}(y_i) (\gamma_{ii}^{\varepsilon_i} |y_i|)^{\frac{1}{1 - \alpha_{ii}\varepsilon_i}} \\ z_i^{(n+1)} = \left( \beta_i + \sum_j \gamma_{ij} |z_j^{(n)}|^{\alpha_{ij}} \right)^{\varepsilon_i} y_i \tag{10}$

## 5. 实验

### 5.1 小波系数对

\begin{align*} I(y_1, y_2) & = \mathrm{DL}(p(y_1, y_2) \parallel p(y_1) \otimes p(y_2)) \\ & = \mathbb{E}_{\boldsymbol{y}} \log{\left( \frac{p_{\boldsymbol{y}}(\boldsymbol{y})}{p_{y_1}(y_1) p_{y_2}(y_2)} \right)} \\ & = \mathbb{E}_{\boldsymbol{y}} \log{p_{\boldsymbol{y}}(\boldsymbol{y})} - \mathbb{E}_{y_1} \log{p_{y_1}(y_1)} - \mathbb{E}_{y_2} \log{p_{y_2}(y_2)} \end{align*}

### 5.2 图像块

#### 5.2.4 去噪

$\hat{\boldsymbol{x}} = \tilde{\boldsymbol{x}} + \sigma^2 \nabla \log{p_{\tilde{\boldsymbol{x}}}} (\tilde{\boldsymbol{x}})$

## 附录

1. Ballé, J., Laparra, V., & Simoncelli, E. P. (2016, January). Density modeling of images using a generalized normalization transformation. In 4th International Conference on Learning Representations, ICLR 2016.
2. Jolliffe, I. T. Principal Component Analysis. Springer, 2 edition, 2002. ISBN 978-0-387-95442-4.
3. Cardoso, Jean-François. Dependence, correlation and Gaussianity in independent component analysis. Journal of Machine Learning Research, 4:1177–1203, 2003. ISSN 1533-7928.
4. Lyu, Siwei and Simoncelli, Eero P. Nonlinear extraction of independent components of natural images using radial Gaussianization. Neural Computation, 21(6), 2009b. doi: 10.1162/neco. 2009.04-08-773.
5. Sinz, Fabian and Bethge, Matthias. Lp-nested symmetric distributions. Journal of Machine Learning Research, 11:3409–3451, 2010. ISSN 1533-7928.
6. Heeger, David J. Normalization of cell responses in cat striate cortex. Visual Neuroscience, 9(2), 1992. doi: 10.1017/S0952523800009640.
7. Figueiredo, M. A. T. and Nowak, R. D. Wavelet-based image estimation: an empirical bayes approach using Jeffrey’s noninformative prior. IEEE Transactions on Image Processing, 10(9), September 2001. doi: 10.1109/83.941856.
8. Portilla, Javier, Strela, Vasily, Wainwright, Martin J., and Simoncelli, Eero P. Image denoising using scale mixtures of Gaussians in the wavelet domain. IEEE Transactions on Image Processing, 12 (11), November 2003. doi: 10.1109/TIP.2003.818640.
9. Theis, Lucas and Bethge, Matthias. Generative image modeling using spatial LSTMs. In Advances in Neural Information Processing Systems 28, pp. 1918–1926, 2015.
10. Ballé, Johannes and Simoncelli, Eero P. Learning sparse filterbank transforms with convolutional ICA. In 2014 IEEE International Conference on Image Processing (ICIP), 2014. doi: 10.1109/ICIP.2014.7025815.