In distributed machine learning(DML)based on the parameter server(PS)architecture,unbalanced communication load distribution of PSs will lead to a significant slowdown of model synchronization in heterogeneous network...In distributed machine learning(DML)based on the parameter server(PS)architecture,unbalanced communication load distribution of PSs will lead to a significant slowdown of model synchronization in heterogeneous networks due to low utilization of bandwidth.To address this problem,a network-aware adaptive PS load distribution scheme is proposed,which accelerates model synchronization by proactively adjusting the communication load on PSs according to network states.We evaluate the proposed scheme on MXNet,known as a realworld distributed training platform,and results show that our scheme achieves up to 2.68 times speed-up of model training in the dynamic and heterogeneous network environment.展开更多
Parameter server(PS)as the state-of-the-art distributed framework for large-scale iterative machine learning tasks has been extensively studied.However,existing PS-based systems often depend on memory implementations....Parameter server(PS)as the state-of-the-art distributed framework for large-scale iterative machine learning tasks has been extensively studied.However,existing PS-based systems often depend on memory implementations.With memory constraints,machine learning(ML)developers cannot train large-scale ML models in their rather small local clusters.Moreover,renting large-scale cloud servers is always economically infeasible for research teams and small companies.In this paper,we propose a disk-resident parameter server system named DRPS,which reduces the hardware requirement of large-scale machine learning tasks by storing high dimensional models on disk.To further improve the performance of DRPS,we build an efficient index structure for parameters to reduce the disk I/O cost.Based on this index structure,we propose a novel multi-objective partitioning algorithm for the parameters.Finally,a flexible workerselection parallel model of computation(WSP)is proposed to strike a right balance between the problem of inconsistent parameter versions(staleness)and that of inconsistent execution progresses(straggler).Extensive experiments on many typical machine learning applications with real and synthetic datasets validate the effectiveness of DRPS.展开更多
模具设计中广泛应用标准模架,但标准模架多采用企业标准,绘制重复工作量大,模架数据更新不及时。介绍了以SolidWorks为支撑软件、以C#作为开发语言和数据库采用SQL server 2000开发三维标准模架库的方法。采用B/S和C/S相结合的系统结构...模具设计中广泛应用标准模架,但标准模架多采用企业标准,绘制重复工作量大,模架数据更新不及时。介绍了以SolidWorks为支撑软件、以C#作为开发语言和数据库采用SQL server 2000开发三维标准模架库的方法。采用B/S和C/S相结合的系统结构模式,实现了模架数据的网络管理、查询和维护,保证了标准模架数据的准确性和及时更新,并实现了三维模架装配体在客户端的参数化,显著提高了设计效率。展开更多
The training efficiency and test accuracy are important factors in judging the scalability of distributed deep learning.In this dissertation,the impact of noise introduced in the mixed national institute of standards ...The training efficiency and test accuracy are important factors in judging the scalability of distributed deep learning.In this dissertation,the impact of noise introduced in the mixed national institute of standards and technology database(MNIST)and CIFAR-10 datasets is explored,which are selected as benchmark in distributed deep learning.The noise in the training set is manually divided into cross-noise and random noise,and each type of noise has a different ratio in the dataset.Under the premise of minimizing the influence of parameter interactions in distributed deep learning,we choose a compressed model(SqueezeNet)based on the proposed flexible communication method.It is used to reduce the communication frequency and we evaluate the influence of noise on distributed deep training in the synchronous and asynchronous stochastic gradient descent algorithms.Focusing on the experimental platform TensorFlowOnSpark,we obtain the training accuracy rate at different noise ratios and the training time for different numbers of nodes.The existence of cross-noise in the training set not only decreases the test accuracy and increases the time for distributed training.The noise has positive effect on destroying the scalability of distributed deep learning.展开更多
基金partially supported by the computing power networks and new communication primitives project under Grant No. HC-CN-2020120001the National Natural Science Foundation of China under Grant No. 62102066Open Research Projects of Zhejiang Lab under Grant No. 2022QA0AB02
文摘In distributed machine learning(DML)based on the parameter server(PS)architecture,unbalanced communication load distribution of PSs will lead to a significant slowdown of model synchronization in heterogeneous networks due to low utilization of bandwidth.To address this problem,a network-aware adaptive PS load distribution scheme is proposed,which accelerates model synchronization by proactively adjusting the communication load on PSs according to network states.We evaluate the proposed scheme on MXNet,known as a realworld distributed training platform,and results show that our scheme achieves up to 2.68 times speed-up of model training in the dynamic and heterogeneous network environment.
基金supported by the National Key R&D Program of China(2018YFB1003404)the National Natural Seience Foundation of China(Grant Nos.62072083,U1811261,61902366)+2 种基金Basal Research Fund(N180716010)Liao Ning Revitalization Talents Program(XLYC1807158)the China Postdoctoral Science Foundation(2020T130623).
文摘Parameter server(PS)as the state-of-the-art distributed framework for large-scale iterative machine learning tasks has been extensively studied.However,existing PS-based systems often depend on memory implementations.With memory constraints,machine learning(ML)developers cannot train large-scale ML models in their rather small local clusters.Moreover,renting large-scale cloud servers is always economically infeasible for research teams and small companies.In this paper,we propose a disk-resident parameter server system named DRPS,which reduces the hardware requirement of large-scale machine learning tasks by storing high dimensional models on disk.To further improve the performance of DRPS,we build an efficient index structure for parameters to reduce the disk I/O cost.Based on this index structure,we propose a novel multi-objective partitioning algorithm for the parameters.Finally,a flexible workerselection parallel model of computation(WSP)is proposed to strike a right balance between the problem of inconsistent parameter versions(staleness)and that of inconsistent execution progresses(straggler).Extensive experiments on many typical machine learning applications with real and synthetic datasets validate the effectiveness of DRPS.
文摘模具设计中广泛应用标准模架,但标准模架多采用企业标准,绘制重复工作量大,模架数据更新不及时。介绍了以SolidWorks为支撑软件、以C#作为开发语言和数据库采用SQL server 2000开发三维标准模架库的方法。采用B/S和C/S相结合的系统结构模式,实现了模架数据的网络管理、查询和维护,保证了标准模架数据的准确性和及时更新,并实现了三维模架装配体在客户端的参数化,显著提高了设计效率。
文摘The training efficiency and test accuracy are important factors in judging the scalability of distributed deep learning.In this dissertation,the impact of noise introduced in the mixed national institute of standards and technology database(MNIST)and CIFAR-10 datasets is explored,which are selected as benchmark in distributed deep learning.The noise in the training set is manually divided into cross-noise and random noise,and each type of noise has a different ratio in the dataset.Under the premise of minimizing the influence of parameter interactions in distributed deep learning,we choose a compressed model(SqueezeNet)based on the proposed flexible communication method.It is used to reduce the communication frequency and we evaluate the influence of noise on distributed deep training in the synchronous and asynchronous stochastic gradient descent algorithms.Focusing on the experimental platform TensorFlowOnSpark,we obtain the training accuracy rate at different noise ratios and the training time for different numbers of nodes.The existence of cross-noise in the training set not only decreases the test accuracy and increases the time for distributed training.The noise has positive effect on destroying the scalability of distributed deep learning.