An ad hoc vector space structure on the parameters of the layers of the neural nets.
Given a fixed data and a differentiable error
function for matrix-valued outputs, returns a
differentiable function that takes wrapped neural nets
as arguments, and returns neural net valued
gradients (again wrapped as ParameterVector).
Given a fixed data and a differentiable error
function for matrix-valued outputs, returns a
differentiable function that takes wrapped neural nets
as arguments, and returns neural net valued
gradients (again wrapped as ParameterVector).
For this, the argument neural net is composed with the error function and applied to the fixed data.
Symbolically, if x is our neural net, d is the data,
and E our error function, this method returns the function
x => E(x(d)), which is evaluated in usual feed-forward manner.
The gradient of this function is evaluated with the classical
backpropagation algorithm. The whole composition might look
somewhat odd, but keep in mind that we want a function that
takes neural nets as inputs and returns the errors on data as
outputs, and furthermore calculates neural-net-valued gradients,
which simply store the gradients wrt. parameters of the neural
net in a data structure that looks exactly like the neural net itself.