TY - JOUR
T1 - Experimental evaluation of the performance of Gpipe parallelism
AU - Zhang, Peng
AU - Lee, Brian
AU - Qiao, Yuansong
N1 - Publisher Copyright:
© 2023 The Author(s)
PY - 2023/10
Y1 - 2023/10
N2 - Pipeline parallelism is the newly proposed model parallelism paradigm for efficiently training giant size Deep Neural Network (DNN) models across multiple accelerators. Gpipe, as a popular pipeline parallelism scheme, has been integrated into the PyTorch framework. Training a model with Gpipe involves choosing a large number of parameters, e.g. determining a DNN model partitioning scheme for the given number of accelerators, selecting the number of GPUs for the training. Therefore, it is crucial to investigate the effects of different Gpipe configurations on the model training performance and assess the scenarios that are suitable for Gpipe. This paper presents a systematic evaluation of Gpipe performance under various settings, including different DNN models, GPU types, GPU numbers, datasets, and model partition strategies. The experiments show several counterintuitive results, i.e. training a DNN model without using Gpipe performs better than using Gpipe, and utilising more GPUs does not guarantee a better performance and sometimes using less GPUs is better while training with Gpipe. Moreover, the test results also show that the GPU type, model size, dataset size, and the DNN model partition scheme clearly influence the training speed. Based on the observation of the evaluation results, the paper proposes a theoretical model to estimate the performance gain ratio while using Gpipe under different setups.
AB - Pipeline parallelism is the newly proposed model parallelism paradigm for efficiently training giant size Deep Neural Network (DNN) models across multiple accelerators. Gpipe, as a popular pipeline parallelism scheme, has been integrated into the PyTorch framework. Training a model with Gpipe involves choosing a large number of parameters, e.g. determining a DNN model partitioning scheme for the given number of accelerators, selecting the number of GPUs for the training. Therefore, it is crucial to investigate the effects of different Gpipe configurations on the model training performance and assess the scenarios that are suitable for Gpipe. This paper presents a systematic evaluation of Gpipe performance under various settings, including different DNN models, GPU types, GPU numbers, datasets, and model partition strategies. The experiments show several counterintuitive results, i.e. training a DNN model without using Gpipe performs better than using Gpipe, and utilising more GPUs does not guarantee a better performance and sometimes using less GPUs is better while training with Gpipe. Moreover, the test results also show that the GPU type, model size, dataset size, and the DNN model partition scheme clearly influence the training speed. Based on the observation of the evaluation results, the paper proposes a theoretical model to estimate the performance gain ratio while using Gpipe under different setups.
KW - Distributed Deep Neural Networks
KW - Gpipe
KW - Performance evaluation
KW - Pipeline parallelism
UR - http://www.scopus.com/inward/record.url?scp=85159460416&partnerID=8YFLogxK
U2 - 10.1016/j.future.2023.04.033
DO - 10.1016/j.future.2023.04.033
M3 - Article
AN - SCOPUS:85159460416
SN - 0167-739X
VL - 147
SP - 107
EP - 118
JO - Future Generation Computer Systems
JF - Future Generation Computer Systems
ER -