Numerous neural network(NN)applications are now being deployed to mobile devices.These applications usually have large amounts of calculation and data while requiring low inference latency,which poses challenges to th...Numerous neural network(NN)applications are now being deployed to mobile devices.These applications usually have large amounts of calculation and data while requiring low inference latency,which poses challenges to the computing ability of mobile devices.Moreover,devices’life and performance depend on temperature.Hence,in many scenarios,such as industrial production and automotive systems,where the environmental temperatures are usually high,it is important to control devices’temperatures to maintain steady operations.In this paper,we propose a thermal-aware channel-wise heterogeneous NN inference algorithm.It contains two parts,the thermal-aware dynamic frequency(TADF)algorithm and the heterogeneous-processor single-layer workload distribution(HSWD)algorithm.Depending on a mobile device’s architecture characteristics and environmental temperature,TADF can adjust the appropriate running speed of the central processing unit and graphics processing unit,and then the workload of each layer in the NN model is distributed by HSWD in line with each processor’s running speed and the characteristics of the layers as well as heterogeneous processors.The experimental results,where representative NNs and mobile devices were used,show that the proposed method can considerably improve the speed of the on-device inference by 21%–43%over the traditional inference method.展开更多
Cloud data centers, such as Amazon EC2, host myriad big data applications using Virtual Machines(VMs). As these applications are communication-intensive, optimizing network transfer between VMs is critical to the perf...Cloud data centers, such as Amazon EC2, host myriad big data applications using Virtual Machines(VMs). As these applications are communication-intensive, optimizing network transfer between VMs is critical to the performance of these applications and network utilization of data centers. Previous studies have addressed this issue by scheduling network flows with coflow semantics or optimizing VM placement with traffic considerations.However, coflow scheduling and VM placement have been conducted orthogonally. In fact, these two mechanisms are mutually dependent, and optimizing these two complementary degrees of freedom independently turns out to be suboptimal. In this paper, we present VirtCO, a practical framework that jointly schedules coflows and places VMs ahead of VM launch to optimize the overall performance of data center applications. We model the joint coflow scheduling and VM placement optimization problem, and propose effective heuristics for solving it. We further implement VirtCO with OpenStack and deploy it in a testbed environment. Extensive evaluation of real-world traces shows that compared with state-of-the-art solutions, VirtCO greatly reduces the average coflow completion time by up to 36.5%. This new framework is also compatible with and readily deployable within existing data center architectures.展开更多
With the continuous enrichment of cloud services, an increasing number of applications are being deployed in data centers. These emerging applications are often communication-intensive and data-parallel, and their per...With the continuous enrichment of cloud services, an increasing number of applications are being deployed in data centers. These emerging applications are often communication-intensive and data-parallel, and their performance is closely related to the underlying network. With their distributed nature, the applications consist of tasks that involve a collection of parallel flows. Traditional techniques to optimize flow-level metrics are agnostic to task-level requirements, leading to poor application-level performance. In this paper, we address the heterogeneous task-level requirements of applications and propose task-aware flow scheduling. First, we model tasks' sensitivity to their completion time by utilities. Second, on the basis of Nash bargaining theory, we establish a flow scheduling model with heterogeneous utility characteristics, and analyze it using Lagrange multiplier method and KKT condition. Third, we propose two utility-aware bandwidth allocation algorithms with different practical constraints. Finally, we present Tasch, a system that enables tasks to maintain high utilities and guarantees the fairness of utilities. To demonstrate the feasibility of our system, we conduct comprehensive evaluations with realworld traffic trace. Communication stages complete up to 1.4 faster on average, task utilities increase up to 2.26,and the fairness of tasks improves up to 8.66 using Tasch in comparison to per-flow mechanisms.展开更多
基金supported by the National Key R&D Program of China (No.2018AAA0100500)the National Natural Science Foundation of China (Nos.61972085,61872079,and 61632008)+5 种基金the Jiangsu Provincial Key Laboratory of Network and Information Security (No.BM2003201)Key Laboratory of Computer Network and Information Integration of Ministry of Education of China (No.93K-9)Southeast University China Mobile Research Institute Joint Innovation Center (No.R21701010102018)the University Synergy Innovation Program of Anhui Province (No.GXXT2020-012)partially supported by Collaborative Innovation Center of Novel Software Technology and Industrialization,the Fundamental Research Funds for the Central Universities,CCF-Baidu Open Fund (No.2021PP15002000)the Future Network Scientific Research Fund Project (No.FNSRFP-2021-YB-02).
文摘Numerous neural network(NN)applications are now being deployed to mobile devices.These applications usually have large amounts of calculation and data while requiring low inference latency,which poses challenges to the computing ability of mobile devices.Moreover,devices’life and performance depend on temperature.Hence,in many scenarios,such as industrial production and automotive systems,where the environmental temperatures are usually high,it is important to control devices’temperatures to maintain steady operations.In this paper,we propose a thermal-aware channel-wise heterogeneous NN inference algorithm.It contains two parts,the thermal-aware dynamic frequency(TADF)algorithm and the heterogeneous-processor single-layer workload distribution(HSWD)algorithm.Depending on a mobile device’s architecture characteristics and environmental temperature,TADF can adjust the appropriate running speed of the central processing unit and graphics processing unit,and then the workload of each layer in the NN model is distributed by HSWD in line with each processor’s running speed and the characteristics of the layers as well as heterogeneous processors.The experimental results,where representative NNs and mobile devices were used,show that the proposed method can considerably improve the speed of the on-device inference by 21%–43%over the traditional inference method.
基金supported by the National Key R&D Program of China(No.2017YFB1003000)the National Natural Science Foundation of China(Nos.61572129,61602112,61502097,61702096,61320106007,and 61632008)+4 种基金the International S&T Cooperation Program of China(No.2015DFA10490)the National Science Foundation of Jiangsu Province(Nos.BK20160695 and BK20170689)the Jiangsu Provincial Key Laboratory of Network and Information Security(No.BM2003201)the Key Laboratory of Computer Network and InformationIntegration of Ministry of Education of China(No.93K-9)supported by the Collaborative Innovation Center of Novel Software Technology and Industrialization and Collaborative Innovation Center of Wireless Communications Technology
文摘Cloud data centers, such as Amazon EC2, host myriad big data applications using Virtual Machines(VMs). As these applications are communication-intensive, optimizing network transfer between VMs is critical to the performance of these applications and network utilization of data centers. Previous studies have addressed this issue by scheduling network flows with coflow semantics or optimizing VM placement with traffic considerations.However, coflow scheduling and VM placement have been conducted orthogonally. In fact, these two mechanisms are mutually dependent, and optimizing these two complementary degrees of freedom independently turns out to be suboptimal. In this paper, we present VirtCO, a practical framework that jointly schedules coflows and places VMs ahead of VM launch to optimize the overall performance of data center applications. We model the joint coflow scheduling and VM placement optimization problem, and propose effective heuristics for solving it. We further implement VirtCO with OpenStack and deploy it in a testbed environment. Extensive evaluation of real-world traces shows that compared with state-of-the-art solutions, VirtCO greatly reduces the average coflow completion time by up to 36.5%. This new framework is also compatible with and readily deployable within existing data center architectures.
基金supported by the National Key R&D Program of China(No.2017YFB1003000)the National Natural Science Foundation of China(Nos.61872079,61572129,61602112,61502097,61702096,61320106007,61632008,and 61702097)+4 种基金the Natural Science Foundation of Jiangsu Province(Nos.BK20160695 and BK20170689)the Fundamental Research Funds for the Central Universities(No.2242018k1G019)the Jiangsu Provincial Key Laboratory of Network and Information Security(No.BM2003201)the Key Laboratory of Computer Network and Information Integration of Ministry of Education of China(No.93K-9)partially supported by the Collaborative Innovation Center of Novel Software Technology and Industrialization and Collaborative Innovation Center of Wireless Communications Technology
文摘With the continuous enrichment of cloud services, an increasing number of applications are being deployed in data centers. These emerging applications are often communication-intensive and data-parallel, and their performance is closely related to the underlying network. With their distributed nature, the applications consist of tasks that involve a collection of parallel flows. Traditional techniques to optimize flow-level metrics are agnostic to task-level requirements, leading to poor application-level performance. In this paper, we address the heterogeneous task-level requirements of applications and propose task-aware flow scheduling. First, we model tasks' sensitivity to their completion time by utilities. Second, on the basis of Nash bargaining theory, we establish a flow scheduling model with heterogeneous utility characteristics, and analyze it using Lagrange multiplier method and KKT condition. Third, we propose two utility-aware bandwidth allocation algorithms with different practical constraints. Finally, we present Tasch, a system that enables tasks to maintain high utilities and guarantees the fairness of utilities. To demonstrate the feasibility of our system, we conduct comprehensive evaluations with realworld traffic trace. Communication stages complete up to 1.4 faster on average, task utilities increase up to 2.26,and the fairness of tasks improves up to 8.66 using Tasch in comparison to per-flow mechanisms.