Second Order Methods applied to Deep Neural Networks
Optimizing deep neural networks involves finding a good enough minimum of a highly nonlinear and
nonconvex function. State of the art first order methods suffer from pathological curvature of the loss
landscape and successful convergence relies on the right metaparameter tweaking. Extending the optimizer
to second order eliminates these problems, at the cost of having to compute the inverse Hessian of the
deep neural network, which takes O(N^3).
The R-operator allows efficient Hessian-vector-product computation of DNNs in O(N), without having to
store the whole Hessian. Combining this operator together with the Lanczos algorithm, an iterative eigenvalue
solver, allows for an efficient computation of eigenvalues in DNNs.
A framework is built that is able to visualize the loss landscape of DNNs together with the
trajectory taken by the optimizer. This is done by performing a PCA over the network parameters
at different points of the trajectory and choosing the two directions in parameter space with the most