Machine Learning

Stanford Univ, Coursera


自習: Neural Network Back Propagation

$\displaystyle \boldsymbol{x} = \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} , \quad \boldsymbol{W} = \begin{pmatrix} w_{11} && w_{12} && w_{13} \\ w_{21} && w_{22} && w_{23} \\ \end{pmatrix} , \quad \boldsymbol{y} = \begin{pmatrix} y_1 \\ y_2 \end{pmatrix} \quad $ とおいて $ \boldsymbol{y} = \boldsymbol{W} \boldsymbol{x} \quad $ とすると
$\begin{eqnarray} \displaystyle \begin{pmatrix} y_1 \\ y_2 \end{pmatrix} & = & \begin{pmatrix} w_{11} && w_{12} && w_{13} \\ w_{21} && w_{22} && w_{23} \\ \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} \\ & = & \begin{pmatrix} w_{11} x_1 + w_{12} x_2 + w_{13} x_3 \\ w_{21} x_1 + w_{22} x_2 + w_{23} x_3 \\ \end{pmatrix} \end{eqnarray}$

$\displaystyle \frac{\partial y_1}{\partial x_1} = w_{11} ~, \quad \frac{\partial y_1}{\partial x_2} = w_{12} ~, \quad \frac{\partial y_1}{\partial x_3} = w_{13} ~, \quad $
$\displaystyle \frac{\partial y_2}{\partial x_1} = w_{21} ~, \quad \frac{\partial y_2}{\partial x_2} = w_{22} ~, \quad \frac{\partial y_2}{\partial x_3} = w_{23} ~ \quad $

$\displaystyle \frac{\partial L}{\partial x_1} = \frac{\partial y_1}{\partial x_1} \cdot \frac{\partial L}{\partial y_1} + \frac{\partial y_2}{\partial x_1} \cdot \frac{\partial L}{\partial y_2} = w_{11} \cdot \frac{\partial L}{\partial y_1} + w_{21} \cdot \frac{\partial L}{\partial y_2} ~, \quad $
$\displaystyle \frac{\partial L}{\partial x_2} = \frac{\partial y_1}{\partial x_2} \cdot \frac{\partial L}{\partial y_1} + \frac{\partial y_2}{\partial x_2} \cdot \frac{\partial L}{\partial y_2} = w_{12} \cdot \frac{\partial L}{\partial y_1} + w_{22} \cdot \frac{\partial L}{\partial y_2} ~, \quad $
$\displaystyle \frac{\partial L}{\partial x_3} = \frac{\partial y_1}{\partial x_3} \cdot \frac{\partial L}{\partial y_1} + \frac{\partial y_2}{\partial x_3} \cdot \frac{\partial L}{\partial y_2} = w_{13} \cdot \frac{\partial L}{\partial y_1} + w_{23} \cdot \frac{\partial L}{\partial y_2} ~, \quad $

$\displaystyle \therefore \begin{pmatrix} \frac{\partial L}{\partial x_1} \\ \frac{\partial L}{\partial x_2} \\ \frac{\partial L}{\partial x_3} \\ \end{pmatrix} = \begin{pmatrix} w_{11} & w_{21} \\ w_{12} & w_{22} \\ w_{13} & w_{23} \end{pmatrix} \begin{pmatrix} \frac{\partial L}{\partial y_1} \\ \frac{\partial L}{\partial y_2} \end{pmatrix} = \boldsymbol{W}^{T} \begin{pmatrix} \frac{\partial L}{\partial y_1} \\ \frac{\partial L}{\partial y_2} \end{pmatrix} $
[注意]

この説明の前提として「Repeat 操作の back propagation は和になる」ことを理解する必要がある。

[forward propagation]
                    +-----> x_1
                    |
      x --->[REPEAT]+-----> x_2
                    |
                    ...
                    +-----> x_n
$\begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{pmatrix} = \mbox{REPEAT}(x) $
[backward propagation] L を損失関数の値として
                    +-----< δL/δx_1
                    |
δL/δx <---[REPEAT]+-----< δL/δx_2
                    |
                    ...
                    +-----< δL/δx_n
$\displaystyle \begin{eqnarray} \frac{\partial L}{\partial x} & = & \frac{\partial L}{\partial x_1} + \frac{\partial L}{\partial x_2} + \cdots + \frac{\partial L}{\partial x_n} \\ & = & \sum_{k=1}^n \frac{\partial L}{\partial x_k} \end{eqnarray}$
Yoshihisa Nitta

http://nw.tsuda.ac.jp/