Scientific Computing Seminar

Date and Place: Thursdays and hybrid (live in 32-349/online via Zoom). For detailed dates see below!

Content

In the Scientific Computing Seminar we host talks of guests and members of the SciComp team as well as students of mathematics, computer science and engineering. Everybody interested in the topics is welcome.

List of Talks

Event Information:

  • Thu
    16
    Sep
    2021

    SC Seminar: Shalini Shalini

    10:00Online

    Shalini Shalini, TU Kaiserslautern

    Title: Sparse Deep Neural Network and Hyperparameter Optimization

    Abstract:

    Despite the considerable success of deep learning in recent years, it is still challenging to deploy state-of-the-art deep neural networks due to the high computational and memory cost. Recent deep learning research has focused on optimally generating sparse neural networks using nonsmooth regularization such as L_1 and L_{2,1} norm. However, the resulting training problem would be nonsmooth, which does not guarantee convergence using the conventional stochastic gradient descent (SGD) approach. A recent solution is the Proximal Stochastic Gradient Descent (ProxSGD) optimizer, which solves this nonsmooth optimization problem and ensures convergence much faster. In practice, the performance
    of ProxSGD could be sensitive to the precise setup of internal hyperparameters. The main focus of this thesis is to effectively train sparse neural networks through weight pruning and filter pruning using ProxSGD optimizer and its hyperparameter optimization. A new approach, GSparsity, is introduced for efficient implementation of filter pruning.

    Firstly, Bayesian optimization and evolutionary algorithms are used to optimize the hyperparameters of ProxSGD, resulting in a range of hyperparameters that helps in achieving good accuracy and compression rate; for example, with DenseNet-201 on the CIFAR100 dataset, an accuracy of 72.01% is achieved with a compression rate of 27.24x (96.33% of weights are pruned) and with ResNet-56 on CIFAR10, 93% accuracy is achieved by removing 93.51% parameters without any loss in baseline accuracy. Secondly, experiments show that ProxSGD performance (in terms of accuracy and compression rate) improves by finetuning the remaining weights using Adam optimizer with a cosine LR scheduler. Thirdly, the GSparsity approach via ProxSGD is proposed for filter pruning and empirically shows that it achieves new state-of-the-art results for filter pruning.

    How to join:

    The talk is held online via Zoom. You can join with the link https://uni-kl-de.zoom.us/j/94636397127?pwd=Y1g4dGVFQitzUHVRQUFpcFB4WVFKQT09.