Scientific Computing Seminar

Date and Place: Thursdays and hybrid (live in 32-349/online via Zoom). For detailed dates see below!

Content

In the Scientific Computing Seminar we host talks of guests and members of the SciComp team as well as students of mathematics, computer science and engineering. Everybody interested in the topics is welcome.

List of Talks

Event Information:

  • Thu
    13
    Dec
    2018

    SC Seminar: Dr. Stefanie Günther

    9:45SC Seminar Room 32-349

    Dr. Stefanie Günther, SciComp

    Title:
    Simultaneous Parallel-in-Layer Training for Deep Residual Networks

    Abstract:

    Deep residual networks (ResNets) have shown great promise to model complex data relations with applications in image classification, speech recognition, or text processing, among others. Despite the rapid methodological developments, compute times for ResNet training however can still be tremendous, measured in the order of hours or even days. While common approaches to decrease the training runtimes mostly involve data-parallelism, the sequential propagation through the network layers creates a scalability barrier where training runtimes increase linearly with the number of layers.

    This talk presents an approach to enables concurrency accross the network layers and thus overcome this scalability barrier. The proposed method is inspired by the fact that the propagation through a ResNet can be interpreted as an optimal control problem. In this context, the discrete network layers are interpreted as the discretization of a time-continuous dynamical system. Recent advances in parallel-in-time integration and optimization methods can thus be leveraged in order to speed up training runtimes. In particular, an iterative multigrid-reduction-in-time approach will be discussed, which recurively divides the time domain (i.e. the layers) into multiple time chunks that can be processed in parallel on multiple compute units. Additionally, the multigrid iterations enable a simultaneous optimization framework where weight updates are based on inexact gradient information.