Machine Learning
Stanford Univ, Coursera
自習: Neural Network Back Propagation
$\displaystyle
\boldsymbol{x} = \begin{pmatrix}
x_1 \\ x_2 \\ x_3
\end{pmatrix} , \quad
\boldsymbol{W} = \begin{pmatrix}
w_{11} && w_{12} && w_{13} \\
w_{21} && w_{22} && w_{23} \\
\end{pmatrix} , \quad
\boldsymbol{y} = \begin{pmatrix}
y_1 \\ y_2
\end{pmatrix} \quad
$
とおいて
$
\boldsymbol{y} = \boldsymbol{W} \boldsymbol{x} \quad $
とすると
$\begin{eqnarray} \displaystyle
\begin{pmatrix}
y_1 \\ y_2
\end{pmatrix}
& = &
\begin{pmatrix}
w_{11} && w_{12} && w_{13} \\
w_{21} && w_{22} && w_{23} \\
\end{pmatrix}
\begin{pmatrix}
x_1 \\ x_2 \\ x_3
\end{pmatrix} \\
& = &
\begin{pmatrix}
w_{11} x_1 + w_{12} x_2 + w_{13} x_3 \\
w_{21} x_1 + w_{22} x_2 + w_{23} x_3 \\
\end{pmatrix}
\end{eqnarray}$
$\displaystyle
\frac{\partial y_1}{\partial x_1} = w_{11} ~, \quad
\frac{\partial y_1}{\partial x_2} = w_{12} ~, \quad
\frac{\partial y_1}{\partial x_3} = w_{13} ~, \quad
$
$\displaystyle
\frac{\partial y_2}{\partial x_1} = w_{21} ~, \quad
\frac{\partial y_2}{\partial x_2} = w_{22} ~, \quad
\frac{\partial y_2}{\partial x_3} = w_{23} ~ \quad
$
$\displaystyle
\frac{\partial L}{\partial x_1} =
\frac{\partial y_1}{\partial x_1} \cdot \frac{\partial L}{\partial y_1} + \frac{\partial y_2}{\partial x_1} \cdot \frac{\partial L}{\partial y_2}
= w_{11} \cdot \frac{\partial L}{\partial y_1} + w_{21} \cdot \frac{\partial L}{\partial y_2} ~, \quad
$
$\displaystyle
\frac{\partial L}{\partial x_2} =
\frac{\partial y_1}{\partial x_2} \cdot \frac{\partial L}{\partial y_1} + \frac{\partial y_2}{\partial x_2} \cdot \frac{\partial L}{\partial y_2}
= w_{12} \cdot \frac{\partial L}{\partial y_1} + w_{22} \cdot \frac{\partial L}{\partial y_2} ~, \quad
$
$\displaystyle
\frac{\partial L}{\partial x_3} =
\frac{\partial y_1}{\partial x_3} \cdot \frac{\partial L}{\partial y_1} + \frac{\partial y_2}{\partial x_3} \cdot \frac{\partial L}{\partial y_2}
= w_{13} \cdot \frac{\partial L}{\partial y_1} + w_{23} \cdot \frac{\partial L}{\partial y_2} ~, \quad
$
$\displaystyle \therefore
\begin{pmatrix}
\frac{\partial L}{\partial x_1} \\
\frac{\partial L}{\partial x_2} \\
\frac{\partial L}{\partial x_3} \\
\end{pmatrix}
=
\begin{pmatrix}
w_{11} & w_{21} \\
w_{12} & w_{22} \\
w_{13} & w_{23}
\end{pmatrix}
\begin{pmatrix}
\frac{\partial L}{\partial y_1} \\
\frac{\partial L}{\partial y_2}
\end{pmatrix}
=
\boldsymbol{W}^{T}
\begin{pmatrix}
\frac{\partial L}{\partial y_1} \\
\frac{\partial L}{\partial y_2}
\end{pmatrix}
$
[注意]
この説明の前提として「Repeat 操作の back propagation は和になる」ことを理解する必要がある。
[forward propagation]
+-----> x_1
|
x --->[REPEAT]+-----> x_2
|
...
+-----> x_n
$\begin{pmatrix}
x_1 \\ x_2 \\ \vdots \\ x_n
\end{pmatrix}
= \mbox{REPEAT}(x)
$
[backward propagation] L を損失関数の値として
+-----< δL/δx_1
|
δL/δx <---[REPEAT]+-----< δL/δx_2
|
...
+-----< δL/δx_n
$\displaystyle \begin{eqnarray}
\frac{\partial L}{\partial x} & = & \frac{\partial L}{\partial x_1} + \frac{\partial L}{\partial x_2} + \cdots + \frac{\partial L}{\partial x_n} \\
& = & \sum_{k=1}^n \frac{\partial L}{\partial x_k}
\end{eqnarray}$
Yoshihisa Nitta
http://nw.tsuda.ac.jp/