标签搜索

目 录CONTENT

文章目录

『聚合』 Softmax偏导及BP过程的推导

沙漠渔
2024-05-07 07:13:34 / 0 评论 / 0 点赞 / 167 阅读 / 3,294 字 / 正在检测是否收录...
温馨提示:
本文最后更新于 2024-05-07,若内容或图片失效,请留言反馈。部分素材来自网络,若不小心影响到您的利益,请联系我们删除。

Softmax求导

其实BP过程在pytorch中可以自动进行,这里进行推导只是强迫症

A

Apart证明softmax求导和softmax的BP过程
本来像手打公式的,想想还是算了,引用部分给出latex公式说明。

A.1

softmax导数
image

A.2

softmax梯度下降
image

B

基本上都是拾人牙慧,在此给出引用和参考。

参考:

  • 矩阵求导术(下) - 知乎 (zhihu.com)

  • nndl


\(引用几个定理B.15和B.16\)

\((B.15)\)

\[ \begin{aligned} & \vec{x} \in k^{M \times 1}, y \in R, \vec{z} \in R^{N \times 1},\quad 则: \\ & \frac{\partial y \vec{z}}{\partial \vec{x}}=y \frac{\partial \vec{z}}{\partial \vec{x}}+\frac{\partial y}{\partial \vec{x}} \cdot \vec{z}^{\top} \in R^{M \times N} \end{aligned} \]

\[\begin{aligned} & \text{[证明]:} \\ & dy\vec{z} \\ & =d y \cdot \vec{z}+y \cdot d \vec{z} \\ &=\vec{z} \cdot d y+y \cdot d \vec{z} \\ &=\vec{z} \cdot \left(\frac{\partial y}{\partial \vec{x}}\right)^{\top} d \vec{x}+y \cdot\left(\frac{\partial \vec{z}}{\partial \vec{x}}\right)^{\top} d \vec{x} \\ & \therefore \frac{\partial y \vec{z}}{\partial \vec{x}}=y \cdot \frac{\partial \vec{z}}{\partial \vec{x}}+\frac{\partial y}{\partial \vec{x}} \cdot \vec{z}^{\top} \end{aligned} \]

\((B.26)\)

\[\begin{aligned} & \vec{x} \in R^N, \quad \vec{f}(\vec{x})=\left[f\left(x_1\right), f\left(x_2\right) \ldots f\left(x_n\right)\right] \in R^N, 则 \\ & \frac{\partial \vec{f}(\vec{x})}{\partial \vec{x}}=\operatorname{diag}\left(\vec{f}^{\prime}(\vec{x})\right) \end{aligned} \]

\[\begin{aligned} & \text { [证明]: } \frac{\partial \vec{f}(\vec{x})}{\partial \vec{x}}=\left[\begin{array}{cccc} \frac{\partial f_1}{\partial x_1} & \frac{\partial f_2}{\partial x_1} & \cdots & \frac{\partial f_n}{\partial \eta_n} \\ \vdots & \vdots & & \vdots \\ \frac{\partial f_1}{\partial x_n} & \frac{\partial f_1}{\partial x_n} & \cdots & -\frac{\partial f_n}{\partial x_n} \end{array}\right]=\left[\begin{array}{llll} f^{\prime}\left(x_1\right) & & \\ & f^{\prime}\left(x_2\right) & & \\ & & \ddots & \\ & & & f^{\prime}\left(x_n\right) \end{array}\right]=\operatorname{diag}\left(\vec{f}^{\prime}(\vec{x})\right) \end{aligned} \]

\(Apart中必须说明的两个推导:\)
\((1)\)

\[\begin{aligned} & \vec{x} \in R^n, \exp (\vec{x})=\left[\begin{array}{c} \exp \left(x_1\right) \\ \vdots \\ \exp \left(x_n\right) \end{array}\right] \in R^n\\ & 故存在偏导:\frac{\partial \exp (\vec{x})}{\partial \vec{x}}=\left[\begin{array}{ccc} \frac{\partial \exp \left(x_1\right)}{\partial x_1} & \cdots & \frac{\partial \exp \left(x_n\right)}{\partial x_1} \\ \vdots & & \\ \frac{\partial \exp \left(x_1\right)}{\partial x_n} & \cdots & \frac{\partial \exp \left(x_n\right)}{\partial x_n} \end{array}\right]=\operatorname{diag}(\exp (\vec{x})) \end{aligned} \]

\((2)\)

\[\begin{aligned} & d\vec{1}^{\top} \exp (\vec{x}) \\ & =\vec{1}^{\top} d \exp (\vec{x}) \\ &=\vec{1}^{\top}\left(\exp ^{\prime}(\vec{x}) \odot d \vec{x}\right) \\ &=\left(\vec{1} \odot \exp ^{\prime}(\vec{x})\right)^{\top} d \vec{x} \\ & \text { 有: } \frac{\partial \vec{1}^{\top} \exp (\vec{x})}{\partial \vec{x}}=\vec{1} \odot \exp ^{\prime}(\vec{x})=\exp ^{\prime}(\vec{x})=\exp (\vec{x}) \end{aligned} \]

C

理解可能有偏颇。


⚠ 文章源地址: https://www.cnblogs.com/aoidayo/p/18005371.html 转载请注明出处
0
广告 广告

评论区