Experimental evaluation of the performance of Gpipe parallelism

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)

Abstract

Pipeline parallelism is the newly proposed model parallelism paradigm for efficiently training giant size Deep Neural Network (DNN) models across multiple accelerators. Gpipe, as a popular pipeline parallelism scheme, has been integrated into the PyTorch framework. Training a model with Gpipe involves choosing a large number of parameters, e.g. determining a DNN model partitioning scheme for the given number of accelerators, selecting the number of GPUs for the training. Therefore, it is crucial to investigate the effects of different Gpipe configurations on the model training performance and assess the scenarios that are suitable for Gpipe. This paper presents a systematic evaluation of Gpipe performance under various settings, including different DNN models, GPU types, GPU numbers, datasets, and model partition strategies. The experiments show several counterintuitive results, i.e. training a DNN model without using Gpipe performs better than using Gpipe, and utilising more GPUs does not guarantee a better performance and sometimes using less GPUs is better while training with Gpipe. Moreover, the test results also show that the GPU type, model size, dataset size, and the DNN model partition scheme clearly influence the training speed. Based on the observation of the evaluation results, the paper proposes a theoretical model to estimate the performance gain ratio while using Gpipe under different setups.

Original languageEnglish
Pages (from-to)107-118
Number of pages12
JournalFuture Generation Computer Systems
Volume147
DOIs
Publication statusPublished - Oct 2023

Keywords

  • Distributed Deep Neural Networks
  • Gpipe
  • Performance evaluation
  • Pipeline parallelism

Fingerprint

Dive into the research topics of 'Experimental evaluation of the performance of Gpipe parallelism'. Together they form a unique fingerprint.

Cite this