Johannes Blühdorn, TU Kaiserslautern
Automatic Differentiation of Fixed Points
In the context of automatic differentiation, special care must be taken when differentiating code that incorporates a fixed point iteration. The memory consumption due to taping for the reverse mode of AD grows with the number of iterations, and even if the forward iteration converges, it is unclear whether the corresponding reverse iteration converges as well. Following a paper by Bruce Christianson, we present conditions for a fixed point to be differentiable with respect to the independent variables, and formulate a fixed point problem for the adjoint values. The reverse iteration is then replaced by iterating the fixed point problem for the adjoint. As it turns out, recording only the last forward iteration on tape provides all that is needed for the reverse sweep, and the gradient can be obtained with the same accuracy as the original fixed point. For any of both fixed point problems, it might not always be possible or efficient to decide in advance whether or not the contraction requirement of Banach’s Fixed Point Theorem is satisfied. In the special case of a linear function, this corresponds to an iteration matrix with some eigenvalues outside the unit circle. The iteration is then in general unstable, but by means of the recursive projection method, it can be stabilized. Following a paper by Renac, we present this in the linear case, and discuss its applicability in the context of the above AD scheme.