摘要
伴随深度学习的研究发展,深度学习框架成为研究深度神经网络的重要工具。深度学习框架极大缩短了网络构建与计算时间,其强大的计算能力来源于GPU。但如何有效地在多种框架下合理分配和使用异构集群中GPU资源是一个重要问题。本文提出一种针对GPU资源深度学习容器云架构DLC,利用容器易部署、易迁移的特点,可以将深度学习框架以容器形式快速部署在异构集群上,结合nvidia-docker实现驱动文件与容器解耦合。DLC以MESOS框架的形式提供服务,通过调度获取资源后快速创建对应需求的深度学习框架,并加载指定的GPU资源及对应的运行库,实现特定版本的深度学习环境的快速创建,这对推动深度学习发展具有一定意义。
With the development of deep learning,deep learning framework has become an important tool for the deep neural network developing.The deep learning framework greatly shortens the network construction and computing time,and its powerful computing ability comes from GPU.But it is an important issue that how to effectively allocate and use GPU resources in heterogeneous cluster among many frameworks.In this paper,we propose a Deep Learning Container Cloud(DLC)architecture for GPU resources specifically.Using the characteristics of easy deployment and easy migration,the frameworks can be deployed on heterogeneous cluster in the form of container,and the GPU driver and container can be decoupled according to nvidia-docker.The DLC provides services in the form of the MESOS framework.After obtaining resources through scheduling,a deep learning framework is created quickly to meet the requirements.DLC will load the specified GPU resource and the corresponding runtime library to achieve the rapid creation of a specific version of the deep learning environment.It is of great significance for promoting the development of deep learning.
作者
肖熠
鲁永泉
谢思烨
XIAO Yi;LU Yong-quan;XIE Si-ye(Computer School,Communication University of China,Beijing 100024,China;High Performance Computing Center,Communication University of China,Beijing 100024,China)
出处
《中国传媒大学学报(自然科学版)》
2017年第6期16-20,共5页
Journal of Communication University of China:Science and Technology