摘要
提出了一种面向地理分布式机器学习的软件框架Geo MX,该框架从通信架构和压缩传输机制两方面着手优化通信。对应设计了分层参数服务器(HiPS)架构和双向稀疏梯度传输(BiSparse)技术,旨在分别减少广域传输的梯度流数量和流大小。GeoMX在跨广域分布的数据中心上最高可取得4倍于数据中心内MXNET的训练效率,且几乎无精度损失。
A software framework, called GeoMX, is proposed for geo-distributed machine learning. GeoMX improves communication efficiency in terms of architecture and compression, and accordingly hierarchical parameter server(HiPS) architecture and bi-directional sparsification(BiSparse) technology are designed to reduce the number and size of gradients transmitted via wide area network(WAN) respectively. In the experiments, GeoMX is deployed on multiple data centers distributed across WAN, while MXNET is deployed in a data center within local area network(LAN). The results show that GeoMX is up to 4 times faster than MXNET with little loss of accuracy.
作者
李宗航
虞红芳
汪漪
LI Zonghang;YU Hongfang;WANG Yi(University of Electronic Science and Technology of China,Chengdu 611731,China;Southern University of Science and Technology,Shenzhen 518055,China;Peng Cheng Laboratory,Shenzhen 518055,China)
出处
《中兴通讯技术》
2020年第5期16-22,共7页
ZTE Technology Journal
基金
国家重点研发计划(2019YFB1802800)
鹏城实验室大湾区未来网络试验与应用环境项目(LZC0019)。
关键词
大数据
人工智能
地理分布式机器学习
梯度稀疏化
big data
artificial intelligence
geo-distributed machine learning
gradient sparsification