# 矩阵微分布局

【注】参考邱锡鹏《神经网络与深度学习》。

## 2. 偏导数

• 分子布局（Numerator Layout）
• 分母布局（Denominator Layout）

【注】向量一般未特殊说明都是用列向量表示。

### 2.1 标量关于向量的偏导数

• 分母布局：

${\begin{array}{c} \frac{\partial y}{\partial \boldsymbol{x}} = [ \frac{\partial y}{\partial x_1}, \cdots, \frac{\partial y}{\partial x_M} ]^T \in \mathbb{R}^{M \times 1} \end{array}}$

• 分子布局：

${\begin{array}{c} \frac{\partial y}{\partial \boldsymbol{x}} = [ \frac{\partial y}{\partial x_1}, \cdots, \frac{\partial y}{\partial x_M} ] \in \mathbb{R}^{1 \times M} \end{array}}$

### 2.2 向量关于标量的偏导数

• 分母布局：

${\begin{array}{c} \frac{\partial{\boldsymbol{y}}}{\partial x} = [ \frac{\partial y_1}{\partial x}, \cdots, \frac{\partial y_N}{\partial x} ] \in \mathbb{R}^{1 \times N} \end{array}}$

• 分子布局：

${\begin{array}{c} \frac{\partial{\boldsymbol{y}}}{\partial x} = [ \frac{\partial y_1}{\partial x}, \cdots, \frac{\partial y_N}{\partial x} ]^T \in \mathbb{R}^{N \times 1} \end{array}}$

### 2.3 向量关于向量的偏导数

• 分母布局：

${\begin{array}{c} \frac{\partial{f(\boldsymbol{x})}}{\partial \boldsymbol{x}} = \left[ \begin{matrix} \frac{\partial y_1}{\partial x_1} & \cdots & \frac{\partial y_N}{\partial x_1} \\ \vdots & \ddots & \vdots \\ \frac{\partial y_1}{\partial x_M} & \cdots & \frac{\partial y_N}{\partial x_M} \end{matrix} \right] = \boldsymbol{J}(f(\boldsymbol{x}))^T \in \mathbb{R}^{M \times N} \end{array}}$

• 分子布局：

${\begin{array}{c} \frac{\partial{f(\boldsymbol{x})}}{\partial \boldsymbol{x}} = \left[ \begin{matrix} \frac{\partial y_1}{\partial x_1} & \cdots & \frac{\partial y_1}{\partial x_M} \\ \vdots & \ddots & \vdots \\ \frac{\partial y_N}{\partial x_1} & \cdots & \frac{\partial y_N}{\partial x_M} \end{matrix} \right] = \boldsymbol{J}(f(\boldsymbol{x})) \in \mathbb{R}^{N \times M} \end{array}}$

• 分母布局 = 分子布局：

${\begin{array}{c} \frac{\partial^2 f(\boldsymbol{x})}{\partial \boldsymbol{x}^2} = \left[ \begin{matrix} \frac{\partial^2 y}{\partial x_1^2} & \cdots & \frac{\partial^2 y}{\partial x_1 \partial x_M} \\ \vdots & \ddots & \vdots \\ \frac{\partial^2 y}{\partial x_M \partial x_1} & \cdots & \frac{\partial^2 y}{\partial x_M^2} \end{matrix} \right] = \boldsymbol{H}(f(\boldsymbol{x})) \in \mathbb{R}^{M \times M} \end{array}}$

【注】$\boldsymbol{J}$$\boldsymbol{H}$ 分别为 Jacobian 矩阵和 Hessian 矩阵。

## 3. 偏导数法则

### 3.1 加减法则

$\boldsymbol{x} \in \mathbb{R}^M, \boldsymbol{y} = f(\boldsymbol{x}) \in \mathbb{R}^N, \boldsymbol{z} = g(\boldsymbol{x}) \in \mathbb{R}^N$，则

${\begin{array}{c} \frac{\partial (\boldsymbol{y}+\boldsymbol{z})}{\partial \boldsymbol{x}} = \frac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}} + \frac{\partial \boldsymbol{z}}{\partial \boldsymbol{x}} \in \mathbb{R}^{M \times N} \end{array}}$

### 3.2 乘法法则

• $\boldsymbol{x} \in \mathbb{R}^M, \boldsymbol{y} = f(\boldsymbol{x}) \in \mathbb{R}^N, \boldsymbol{z} = g(\boldsymbol{x}) \in \mathbb{R}^N$，则

${\begin{array}{c} \frac{\partial \boldsymbol{y}^T \boldsymbol{z}}{\partial \boldsymbol{x}} = \frac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}} \boldsymbol{z} + \frac{\partial \boldsymbol{z}}{\partial \boldsymbol{x}} \boldsymbol{y} \in \mathbb{R}^M \end{array}}$

• $\boldsymbol{x} \in \mathbb{R}^M, \boldsymbol{y} = f(\boldsymbol{x}) \in \mathbb{R}^S, \boldsymbol{z} = g(\boldsymbol{x}) \in \mathbb{R}^T, A \in \mathbb{R}^{S \times T}$$\boldsymbol{x}$ 无关，则

${\begin{array}{c} \frac{\partial \boldsymbol{y}^T \boldsymbol{Az}}{\partial \boldsymbol{x}} = \frac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}} \boldsymbol{A} \boldsymbol{z} + \frac{\partial \boldsymbol{z}}{\partial \boldsymbol{x}} \boldsymbol{A}^T \boldsymbol{y} \in \mathbb{R}^M \end{array}}$

• $\boldsymbol{x} \in \mathbb{R}^M, y = f(\boldsymbol{x}) \in \mathbb{R}, \boldsymbol{z} = g(\boldsymbol{x}) \in \mathbb{R}^N$，则

${\begin{array}{c} \frac{\partial y \boldsymbol{z}}{\partial \boldsymbol{x}} = y \frac{\partial \boldsymbol{z}}{\partial \boldsymbol{x}} + \frac{\partial y}{\partial \boldsymbol{x}} \boldsymbol{z}^T \in \mathbb{R}^{M \times N} \end{array}}$

### 3.3 链式法则

• $x \in \mathbb{R}, \boldsymbol{y} = g(x) \in \mathbb{R}^M, \boldsymbol{z} = f(\boldsymbol{y}) \in \mathbb{R}^N$，则

${\begin{array}{c} \frac{\partial \boldsymbol{z}}{\partial x} = \frac{\partial \boldsymbol{y}}{\partial x} \frac{\partial \boldsymbol{z}}{\partial \boldsymbol{y}} \in \mathbb{R}^{1 \times N} \end{array}}$

• $\boldsymbol{x} \in \mathbb{R}^M, \boldsymbol{y} = g(\boldsymbol{x}) \in \mathbb{R}^K, \boldsymbol{z} = f(\boldsymbol{y}) \in \mathbb{R}^N$，则

${\begin{array}{c} \frac{\partial \boldsymbol{z}}{\partial \boldsymbol{x}} = \frac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}} \frac{\partial \boldsymbol{z}}{\partial \boldsymbol{y}} \in \mathbb{R}^{M \times N} \end{array}}$

• $\boldsymbol{X} \in \mathbb{R}^{M \times N}$ 为矩阵，$\boldsymbol{y} = g(\boldsymbol{X}) \in \mathbb{R}^K, z = f(\boldsymbol{y})$，则

${\begin{array}{c} \frac{\partial z}{\partial x_{ij}} = \frac{\partial \boldsymbol{y}}{\partial x_{ij}} \frac{\partial z}{\partial \boldsymbol{y}} \in \mathbb{R} \end{array}}$

## 4. 附录

### 4.1 记忆小技巧：

• 可以把「标量对向量」和「向量对标量」求导中的标量看作是一维的行向量，而向量则为一般理解的列向量，则分母/分子布局就表示求导后的向量的布局是跟求导式的分母还是分子保持一致。
1. 「标量对向量」求导：分母布局结果为列向量，分子布局结果为行向量。
2. 「向量对标量」求导：分母布局结果为行向量，分子布局结果为列向量。
• 对于「向量对向量」求导：既可以看作是分子向量中的每个标量元素对分母向量求导，也可以看作是分母向量对分子向量中的每个标量元素求导，然后便可以使用上一条记忆方法。最终将每个求导的结果向量拼接成一个矩阵，即得到最终的分母/分子布局结果。