Parallelized Training of Deep NN – Comparison of Current Concepts and Frameworks

Abstract

Horizontal scalability is a major facilitator of recent advances in deep learning. Common deep learning frameworks offer different approaches for scaling the training process. We operationalize the execution of distributed training using Kubernetes and helm templates. This way we lay ground for a systematic comparison of deep learning frameworks. For two of them, TensorFlow and MXNet we examine their properties with regard to throughput, scalability and practical ease of use.

Type
Publication
Second Workshop on Distributed Infrastructures for Deep Learning
Sebastian Jäger
Sebastian Jäger
PhD Student
Hans Peter Zorn
Hans Peter Zorn
Head of Artificial Intelligence
Stefan Igel
Stefan Igel
Head of Big Data Solutions
Christian Zirpins
Christian Zirpins
Professor for Distributed Systems