# Technical Report MSC-2021-39

 TR#: MSC-2021-39 Class: MSC Title: Scalable Distributed Deep Learning With Model Parallelism Authors: Saar Eliad Supervisors: Assaf Schuster PDF Currently accessibly only within the Technion network Abstract: This work discusses a particular case of Deep Learning where the model is too large to fully fit into the memory of a single GPU during training. Fundamentally, it focuses on fine-tuning giant neural networks on commodity hardware with automatic pipeline model parallelism. Fine-tuning is an increasingly common technique that leverages transfer learning to dramatically expedite the training of huge, high-quality models. Critically, it holds the potential to make giant state-of-the-art models pre-trained on high-end super-computing-grade systems readily available for users that lack access to such costly resources. Unfortunately, this potential is still difficult to realize because the models often do not fit in the memory of a single commodity GPU, making fine-tuning a challenging problem. We present FTPipe, a system that explores a previously unexplored dimension of pipeline model parallelism, making the efficient multi-GPU execution of fine-tuning tasks for giant neural networks readily accessible. A key novel concept, called Mixed-pipe, allows balancing the compute and memory-load on the GPUs by partitioning the model into computational blocks of any granularity while relaxing model topology constraints. Our system goes beyond synchronization and topology limitations of previous pipeline-parallel approaches, efficiently training a new family of models, including the current state-of-the-art. Our extensive experiments on giant NLP models (BERT-340M, GPT2-1.5B, and T5-3B) show that FTPipe achieves up to3$\times$ speedup and state-of-the-art accuracy when fine-tuning giant transformers with billions of parameters. These models require from 12GB to 59GB of GPU memory, and FTPipe executes them on 8 commodity RTX2080-Ti GPUs, each with 11GB memory and standard PCIe. Copyright The above paper is copyright by the Technion, Author(s), or others. Please contact the author(s) for more information

Remark: Any link to this technical report should be to this page (http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-info.cgi/2021/MSC/MSC-2021-39), rather than to the URL of the PDF files directly. The latter URLs may change without notice.