Efficiency of batch processing is becoming increasingly important for many modern commercial service centers, e.g., clusters and cloud computing datacenters. However, periodical resource contentions have become the ma...Efficiency of batch processing is becoming increasingly important for many modern commercial service centers, e.g., clusters and cloud computing datacenters. However, periodical resource contentions have become the major performance obstacles for concurrently running applications on mainstream CMP servers. I/O contention is such a kind of obstacle, which may impede both the co-running performance of batch jobs and the system throughput seriously. In this paper, a dynamic I/O-aware scheduling algorithm is proposed to lower the impacts of I/O contention and to enhance the co-running performance in batch processing. We set up our environment on an 8-socket, 64-core server in Dawning Linux Cluster. Fifteen workloads ranging from 8 jobs to 256 jobs are evaluated. Our experimental results show significant improvements on the throughputs of the workloads, which range from 7% to 431%. Meanwhile, noticeable improvements on the slowdown of workloads and the average runtime for each job can be achieved. These results show that a well-tuned dynamic I/O-aware scheduler is beneficial for batch-mode services. It can also enhance the resource utilization via throughput improvement on modern service platforms.展开更多
Significant advances in field-programmable gate arrays (FPGAs) have made it viable to explore innovative multiprocessor solutions on a single FPGA chip. For multiprocessors, an efficient communication network that m...Significant advances in field-programmable gate arrays (FPGAs) have made it viable to explore innovative multiprocessor solutions on a single FPGA chip. For multiprocessors, an efficient communication network that matches the needs of the target application is always critical to the overall performance. Wormhole packet-switching network-on-chip (NoC) solutions are replacing conventional shared buses to deal with scalability and complexity challenges coming along with the increasing number of processing elements (PEs). However, the quest for high performance networks has led to very complex and resource-expensive NoC designs, leaving little room for the real computing force, i.e., PEs. Moreover, many techniques offer very small performance gains or none at all when network traffic is light while increasing the resource usage of routers. We argue that computation is still the primary task of multiprocessors and sufficient resources should be reserved for PEs. This paper presents our novel design and implementation of a resource-efficient communication network for multiprocessors on FPGAs. We reduce not only the required number of routers for a given number of PEs by introducing a new PE-router topology, but also the resource requirement of each router. Our communication network relies on the NEWS channels to transfer packets in a pipelined fashion following the path determined by the routing network, The implementation results on various Xilinx FPGAs show good performance in the typical range of network load for multiprocessor applications.展开更多
As the number of cores in chip multiprocessors (CMPs) increases, cache coherence protocol has become a key issue in integration of chip multiprocessors. Supporting cache coherence protocol in large chip multiprocess...As the number of cores in chip multiprocessors (CMPs) increases, cache coherence protocol has become a key issue in integration of chip multiprocessors. Supporting cache coherence protocol in large chip multiprocessors still faces three hurdles: design complexity, performance and scalability. This paper proposes Cache Coherent Network on Chip (CCNoC), a scheme that decouples cache coherency maintenance from processors and shared L2 caches and implements it completely in network on chip to free up processors and shared L2 caches from the chore of maintaining coherency, thereby reduces design complexity of CMPs. In this way, CCNoC also improves the performance of cache coherence protocol through reducing directory access latency and enhances scalability by avoiding massive directories overhead in shared L2 caches. In CCNoC, coherence state caches and active directory caches are implemented in the network interface components of network on chip to maintain cache coherence states for blocks in L1 caches and manage directory information for recently accessed blocks in L2 caches respectively. CCNoC provides a scalable CMP framework to tackle cache coherency which is the foundation of CMP. This paper evaluates the performance of CCNoC. Experimental results show that for a 16-core system, CCNoC improves performance by 3% on average over the conventional chip multiprocessor and by 10% at best, while reduces storage overhead by 1.8% and saves directory storage by 88%, showing good scalability.展开更多
The high-speed computational performance is gained at the cost of huge hardware resource,which restricts the application of high-accuracy algorithms because of the limited hardware cost in practical use.To solve the p...The high-speed computational performance is gained at the cost of huge hardware resource,which restricts the application of high-accuracy algorithms because of the limited hardware cost in practical use.To solve the problem,a novel method for designing the field programmable gate array(FPGA)-based non-uniform rational B-spline(NURBS) interpolator and motion controller,which adopts the embedded multiprocessor technique,is proposed in this study.The hardware and software design for the multiprocessor,one of which is for NURBS interpolation and the other for position servo control,is presented.Performance analysis and experiments on an X-Y table are carried out,hardware cost as well as consuming time for interpolation and motion control is compared with the existing methods.The experimental and comparing results indicate that,compared with the existing methods,the proposed method can reduce the hardware cost by 97.5% using higher-accuracy interpolation algorithm within the period of 0.5 ms.A method which ensures the real-time performance and interpolation accuracy,and reduces the hardware cost significantly is proposed,and it’s practical in the use of industrial application.展开更多
Increasing the life span and efficiency of Multiprocessor System on Chip(MPSoC)by reducing power and energy utilization has become a critical chip design challenge for multiprocessor systems.With the advancement of te...Increasing the life span and efficiency of Multiprocessor System on Chip(MPSoC)by reducing power and energy utilization has become a critical chip design challenge for multiprocessor systems.With the advancement of technology,the performance management of central processing unit(CPU)is changing.Power densities and thermal effects are quickly increasing in multi-core embedded technologies due to shrinking of chip size.When energy consumption reaches a threshold that creates a delay in complementary metal oxide semiconductor(CMOS)circuits and reduces the speed by 10%–15%because excessive on-chip temperature shortens the chip’s life cycle.In this paper,we address the scheduling&energy utilization problem by introducing and evaluating an optimal energy-aware earliest deadline first scheduling(EA-EDF)based technique formultiprocessor environments with task migration that enhances the performance and efficiency in multiprocessor systemon-chip while lowering energy and power consumption.The selection of core andmigration of tasks prevents the system from reaching itsmaximumenergy utilization while effectively using the dynamic power management(DPM)policy.Increase in the execution of tasks the temperature and utilization factor(u_(i))on-chip increases that dissipate more power.The proposed approach migrates such tasks to the core that produces less heat and consumes less power by distributing the load on other cores to lower the temperature and optimizes the duration of idle and sleep times across multiple CPUs.The performance of the EA-EDF algorithm was evaluated by an extensive set of experiments,where excellent results were reported when compared to other current techniques,the efficacy of the proposed methodology reduces the power and energy consumption by 4.3%–4.7%on a utilization of 6%,36%&46%at 520&624 MHz operating frequency when particularly in comparison to other energy-aware methods for MPSoCs.Tasks are running and accurately scheduled to make an energy-efficient processor by controlling and managing the thermal effects on-chip and optimizing the energy consumption of MPSoCs.展开更多
Minimizing the energy consumption to increase the life span and performance of multiprocessor system on chip(MPSoC)has become an integral chip design issue for multiprocessor systems.The performance measurement of com...Minimizing the energy consumption to increase the life span and performance of multiprocessor system on chip(MPSoC)has become an integral chip design issue for multiprocessor systems.The performance measurement of computational systems is changing with the advancement in technology.Due to shrinking and smaller chip size power densities onchip are increasing rapidly that increasing chip temperature in multi-core embedded technologies.The operating speed of the device decreases when power consumption reaches a threshold that causes a delay in complementary metal oxide semiconductor(CMOS)circuits because high on-chip temperature adversely affects the life span of the chip.In this paper an energy-aware dynamic power management technique based on energy aware earliest deadline first(EA-EDF)scheduling is proposed for improving the performance and reliability by reducing energy and power consumption in the system on chip(SOC).Dynamic power management(DPM)enables MPSOC to reduce power and energy consumption by adopting a suitable core configuration for task migration.Task migration avoids peak temperature values in the multicore system.High utilization factor(ui)on central processing unit(CPU)core consumes more energy and increases the temperature on-chip.Our technique switches the core bymigrating such task to a core that has less temperature and is in a low power state.The proposed EA-EDF scheduling technique migrates load on different cores to attain stability in temperature among multiple cores of the CPU and optimized the duration of the idle and sleep periods to enable the low-temperature core.The effectiveness of the EA-EDF approach reduces the utilization and energy consumption compared to other existing methods and works.The simulation results show the improvement in performance by optimizing 4.8%on u_(i) 9%,16%,23%and 25%at 520 MHz operating frequency as compared to other energy-aware techniques for MPSoCs when the least number of tasks is in running state and can schedule more tasks to make an energy-efficient processor by controlling and managing the energy consumption of MPSoC.展开更多
基金Supported by the National High Technology Research and Development 863 Program of China under Grant No.2012AA010902the National Basic Research 973 Program of China under Grant No.2011CB302504the National Natural Science Foundation of China under Grant Nos.61202055,60925009,60921002,61100011
文摘Efficiency of batch processing is becoming increasingly important for many modern commercial service centers, e.g., clusters and cloud computing datacenters. However, periodical resource contentions have become the major performance obstacles for concurrently running applications on mainstream CMP servers. I/O contention is such a kind of obstacle, which may impede both the co-running performance of batch jobs and the system throughput seriously. In this paper, a dynamic I/O-aware scheduling algorithm is proposed to lower the impacts of I/O contention and to enhance the co-running performance in batch processing. We set up our environment on an 8-socket, 64-core server in Dawning Linux Cluster. Fifteen workloads ranging from 8 jobs to 256 jobs are evaluated. Our experimental results show significant improvements on the throughputs of the workloads, which range from 7% to 431%. Meanwhile, noticeable improvements on the slowdown of workloads and the average runtime for each job can be achieved. These results show that a well-tuned dynamic I/O-aware scheduler is beneficial for batch-mode services. It can also enhance the resource utilization via throughput improvement on modern service platforms.
文摘Significant advances in field-programmable gate arrays (FPGAs) have made it viable to explore innovative multiprocessor solutions on a single FPGA chip. For multiprocessors, an efficient communication network that matches the needs of the target application is always critical to the overall performance. Wormhole packet-switching network-on-chip (NoC) solutions are replacing conventional shared buses to deal with scalability and complexity challenges coming along with the increasing number of processing elements (PEs). However, the quest for high performance networks has led to very complex and resource-expensive NoC designs, leaving little room for the real computing force, i.e., PEs. Moreover, many techniques offer very small performance gains or none at all when network traffic is light while increasing the resource usage of routers. We argue that computation is still the primary task of multiprocessors and sufficient resources should be reserved for PEs. This paper presents our novel design and implementation of a resource-efficient communication network for multiprocessors on FPGAs. We reduce not only the required number of routers for a given number of PEs by introducing a new PE-router topology, but also the resource requirement of each router. Our communication network relies on the NEWS channels to transfer packets in a pipelined fashion following the path determined by the routing network, The implementation results on various Xilinx FPGAs show good performance in the typical range of network load for multiprocessor applications.
基金supported by the National Natural Science Foundation of China under Grant Nos.60970002,60833004,60773146, and 60673145.
文摘As the number of cores in chip multiprocessors (CMPs) increases, cache coherence protocol has become a key issue in integration of chip multiprocessors. Supporting cache coherence protocol in large chip multiprocessors still faces three hurdles: design complexity, performance and scalability. This paper proposes Cache Coherent Network on Chip (CCNoC), a scheme that decouples cache coherency maintenance from processors and shared L2 caches and implements it completely in network on chip to free up processors and shared L2 caches from the chore of maintaining coherency, thereby reduces design complexity of CMPs. In this way, CCNoC also improves the performance of cache coherence protocol through reducing directory access latency and enhances scalability by avoiding massive directories overhead in shared L2 caches. In CCNoC, coherence state caches and active directory caches are implemented in the network interface components of network on chip to maintain cache coherence states for blocks in L1 caches and manage directory information for recently accessed blocks in L2 caches respectively. CCNoC provides a scalable CMP framework to tackle cache coherency which is the foundation of CMP. This paper evaluates the performance of CCNoC. Experimental results show that for a 16-core system, CCNoC improves performance by 3% on average over the conventional chip multiprocessor and by 10% at best, while reduces storage overhead by 1.8% and saves directory storage by 88%, showing good scalability.
基金supported by National Key Basic Research Program of China(973 ProgramGrant No.2011CB706804)+1 种基金Shanghai Municipal Science and Technology Commission of China(Grant No.11QH1401400)Research Project of State Key Laboratory of Mechanical System & Vibration of China(Grant No.MSVMS201102)
文摘The high-speed computational performance is gained at the cost of huge hardware resource,which restricts the application of high-accuracy algorithms because of the limited hardware cost in practical use.To solve the problem,a novel method for designing the field programmable gate array(FPGA)-based non-uniform rational B-spline(NURBS) interpolator and motion controller,which adopts the embedded multiprocessor technique,is proposed in this study.The hardware and software design for the multiprocessor,one of which is for NURBS interpolation and the other for position servo control,is presented.Performance analysis and experiments on an X-Y table are carried out,hardware cost as well as consuming time for interpolation and motion control is compared with the existing methods.The experimental and comparing results indicate that,compared with the existing methods,the proposed method can reduce the hardware cost by 97.5% using higher-accuracy interpolation algorithm within the period of 0.5 ms.A method which ensures the real-time performance and interpolation accuracy,and reduces the hardware cost significantly is proposed,and it’s practical in the use of industrial application.
文摘Increasing the life span and efficiency of Multiprocessor System on Chip(MPSoC)by reducing power and energy utilization has become a critical chip design challenge for multiprocessor systems.With the advancement of technology,the performance management of central processing unit(CPU)is changing.Power densities and thermal effects are quickly increasing in multi-core embedded technologies due to shrinking of chip size.When energy consumption reaches a threshold that creates a delay in complementary metal oxide semiconductor(CMOS)circuits and reduces the speed by 10%–15%because excessive on-chip temperature shortens the chip’s life cycle.In this paper,we address the scheduling&energy utilization problem by introducing and evaluating an optimal energy-aware earliest deadline first scheduling(EA-EDF)based technique formultiprocessor environments with task migration that enhances the performance and efficiency in multiprocessor systemon-chip while lowering energy and power consumption.The selection of core andmigration of tasks prevents the system from reaching itsmaximumenergy utilization while effectively using the dynamic power management(DPM)policy.Increase in the execution of tasks the temperature and utilization factor(u_(i))on-chip increases that dissipate more power.The proposed approach migrates such tasks to the core that produces less heat and consumes less power by distributing the load on other cores to lower the temperature and optimizes the duration of idle and sleep times across multiple CPUs.The performance of the EA-EDF algorithm was evaluated by an extensive set of experiments,where excellent results were reported when compared to other current techniques,the efficacy of the proposed methodology reduces the power and energy consumption by 4.3%–4.7%on a utilization of 6%,36%&46%at 520&624 MHz operating frequency when particularly in comparison to other energy-aware methods for MPSoCs.Tasks are running and accurately scheduled to make an energy-efficient processor by controlling and managing the thermal effects on-chip and optimizing the energy consumption of MPSoCs.
文摘Minimizing the energy consumption to increase the life span and performance of multiprocessor system on chip(MPSoC)has become an integral chip design issue for multiprocessor systems.The performance measurement of computational systems is changing with the advancement in technology.Due to shrinking and smaller chip size power densities onchip are increasing rapidly that increasing chip temperature in multi-core embedded technologies.The operating speed of the device decreases when power consumption reaches a threshold that causes a delay in complementary metal oxide semiconductor(CMOS)circuits because high on-chip temperature adversely affects the life span of the chip.In this paper an energy-aware dynamic power management technique based on energy aware earliest deadline first(EA-EDF)scheduling is proposed for improving the performance and reliability by reducing energy and power consumption in the system on chip(SOC).Dynamic power management(DPM)enables MPSOC to reduce power and energy consumption by adopting a suitable core configuration for task migration.Task migration avoids peak temperature values in the multicore system.High utilization factor(ui)on central processing unit(CPU)core consumes more energy and increases the temperature on-chip.Our technique switches the core bymigrating such task to a core that has less temperature and is in a low power state.The proposed EA-EDF scheduling technique migrates load on different cores to attain stability in temperature among multiple cores of the CPU and optimized the duration of the idle and sleep periods to enable the low-temperature core.The effectiveness of the EA-EDF approach reduces the utilization and energy consumption compared to other existing methods and works.The simulation results show the improvement in performance by optimizing 4.8%on u_(i) 9%,16%,23%and 25%at 520 MHz operating frequency as compared to other energy-aware techniques for MPSoCs when the least number of tasks is in running state and can schedule more tasks to make an energy-efficient processor by controlling and managing the energy consumption of MPSoC.