Cloud computing has taken over the high-performance distributed computing area,and it currently provides on-demand services and resource polling over the web.As a result of constantly changing user service demand,the ...Cloud computing has taken over the high-performance distributed computing area,and it currently provides on-demand services and resource polling over the web.As a result of constantly changing user service demand,the task scheduling problem has emerged as a critical analytical topic in cloud computing.The primary goal of scheduling tasks is to distribute tasks to available processors to construct the shortest possible schedule without breaching precedence restrictions.Assignments and schedules of tasks substantially influence system operation in a heterogeneous multiprocessor system.The diverse processes inside the heuristic-based task scheduling method will result in varying makespan in the heterogeneous computing system.As a result,an intelligent scheduling algorithm should efficiently determine the priority of every subtask based on the resources necessary to lower the makespan.This research introduced a novel efficient scheduling task method in cloud computing systems based on the cooperation search algorithm to tackle an essential task and schedule a heterogeneous cloud computing problem.The basic idea of thismethod is to use the advantages of meta-heuristic algorithms to get the optimal solution.We assess our algorithm’s performance by running it through three scenarios with varying numbers of tasks.The findings demonstrate that the suggested technique beats existingmethods NewGenetic Algorithm(NGA),Genetic Algorithm(GA),Whale Optimization Algorithm(WOA),Gravitational Search Algorithm(GSA),and Hybrid Heuristic and Genetic(HHG)by 7.9%,2.1%,8.8%,7.7%,3.4%respectively according to makespan.展开更多
5G,8K视频等新业务类型不断涌现,使得网络处理器(network processor,NP)的应用场景日趋复杂多样.为满足多样化网络应用在性能、灵活性以及服务质量保证等方面的差异化需求,传统NP试图在片上系统(system on chip,SoC)上集成大量处理器核...5G,8K视频等新业务类型不断涌现,使得网络处理器(network processor,NP)的应用场景日趋复杂多样.为满足多样化网络应用在性能、灵活性以及服务质量保证等方面的差异化需求,传统NP试图在片上系统(system on chip,SoC)上集成大量处理器核、高速缓存、加速器等异质处理资源,提供面向多样化应用场景的敏捷可定制能力.然而,随着摩尔定律和登纳德缩放定律失效问题的逐渐凸显,单片NP芯片研制在研发周期、成本、创新迭代等方面面临巨大挑战,越来越难以为继.针对上述问题,提出新型敏捷可定制NP架构ChipletNP,基于芯粒化(Chiplet)技术解耦异质资源,在充分利用成熟芯片产品及工艺的基础上,通过多个芯粒组合,满足不同应用场景下NP的快速定制和演化发展需求.基于ChipletNP设计实现了一款集成商用CPU、FPGA(field programmable gate array)和自研敏捷交换芯粒的银河衡芯敏捷NP芯片(YHHX-NP).基于该芯片的应用部署与实验结果表明,ChipletNP可支持NP的快速敏捷定制,能够有效承载SRv6(segment routing over IPv6)等新型网络协议与网络功能部署.其中,核心的敏捷交换芯粒相较于同级商用芯片能效比提升2倍以上,延迟控制在2.82μs以内,可以有效支持面向NP的Chiplet统一通信与集成.展开更多
为了提高异构多核处理器平台的计算性能,从任务调度的角度出发,提出了一种使用黄金正弦和莱维飞行机制改进的麻雀搜索算法(Fusion of Golden Sinusoidal and Levy Flight in Sparrow Search Algorithm,GSLF-SSA)来优化异构多核处理器的...为了提高异构多核处理器平台的计算性能,从任务调度的角度出发,提出了一种使用黄金正弦和莱维飞行机制改进的麻雀搜索算法(Fusion of Golden Sinusoidal and Levy Flight in Sparrow Search Algorithm,GSLF-SSA)来优化异构多核处理器的任务调度。通过对异构任务调度的分析,将异构任务建模为DAG(Directed Acyclic Graph)任务模型,通过对其优先级进行随机编码分配,实现了GSLF-SSA算法求解域从连续到离散的映射,使该算法更能适用于异构多核任务调度之中。将DAG任务的最优调度长度作为算法的适应度值进行迭代寻优,通过与目前应用广泛的麻雀搜索算法(SSA)、混合式任务调度算法(IHSSA)、人工蜂群算法(ABC)等多种启发式算法在异构任务调度环境下的实验对比表明,GSLF-SSA能获得更优的调度长度与更短的调度执行时间。展开更多
Due to advances in semiconductor techniques, many-core processors have been widely used in high performance computing. However, many applications still cannot be carried out efficiently due to the memory wall, which h...Due to advances in semiconductor techniques, many-core processors have been widely used in high performance computing. However, many applications still cannot be carried out efficiently due to the memory wall, which has become a bottleneck in many-core processors. In this paper, we present a novel heterogeneous many-core processor architecture named deeply fused many-core (DFMC) for high performance computing systems. DFMC integrates management processing ele- ments (MPEs) and computing processing elements (CPEs), which are heterogeneous processor cores for different application features with a unified ISA (instruction set architecture), a unified execution model, and share-memory that supports cache coherence. The DFMC processor can alleviate the memory wall problem by combining a series of cooperative computing techniques of CPEs, such as multi-pattern data stream transfer, efficient register-level communication mechanism, and fast hardware synchronization technique. These techniques are able to improve on-chip data reuse and optimize memory access performance. This paper illustrates an implementation of a full system prototype based on FPGA with four MPEs and 256 CPEs. Our experimental results show that the effect of the cooperative computing techniques of CPEs is significant, with DGEMM (double-precision matrix multiplication) achieving an efficiency of 94%, FFT (fast Fourier transform) obtaining a performance of 207 GFLOPS and FDTD (finite-difference time-domain) obtaining a performance of 27 GFLOPS.展开更多
面对物联网的快速发展,需要低延时、高性能的处理器来实现关键数据的传输和保护,同时要提高处理器的硬件安全,减少非法用户对处理器的攻击。结合当前开源第五代精简指令集(Reduced Instruction Set Computing-Five,RISC-V)处理器架构优...面对物联网的快速发展,需要低延时、高性能的处理器来实现关键数据的传输和保护,同时要提高处理器的硬件安全,减少非法用户对处理器的攻击。结合当前开源第五代精简指令集(Reduced Instruction Set Computing-Five,RISC-V)处理器架构优点,与现场可编程门阵列(Field Programmable Gate Array,FPGA)相结合,设计了异构处理器,提出了基于密码的安全启动模型。首先,细化RISC-V异构处理器的体系结构,设计轻量级密码启动安全模型TrustZone,实现处理器性能与安全的平衡,并结合FPGA的优点,实现定制化的专用协议与业务通信。其次,提出当前RISC-V异构处理器可实现的便捷途径,并基于此进行模型搭建和测试验证。验证结果表明,虽然采用TrustZone安全度量后处理器启动时间有所增加,但针对轻量级的处理器应用场景,在增强处理器安全的前提下,该启动时间开销是可以接受的。展开更多
Multi-core homogeneous processors have been widely used to deal with computation-intensive embedded applications. However, with the continuous down scaling of CMOS technology, within-die variations in the manufacturin...Multi-core homogeneous processors have been widely used to deal with computation-intensive embedded applications. However, with the continuous down scaling of CMOS technology, within-die variations in the manufacturing process lead to a significant spread in the operating speeds of cores within homogeneous multi-core processors. Task scheduling approaches, which do not consider such heterogeneity caused by within-die variations,can lead to an overly pessimistic result in terms of performance. To realize an optimal performance according to the actual maximum clock frequencies at which cores can run, we present a heterogeneity-aware schedule refining(HASR) scheme by fully exploiting the heterogeneities of homogeneous multi-core processors in embedded domains.We analyze and show how the actual maximum frequencies of cores are used to guide the scheduling. In the scheme,representative chip operating points are selected and the corresponding optimal schedules are generated as candidate schedules. During the booting of each chip, according to the actual maximum clock frequencies of cores, one of the candidate schedules is bound to the chip to maximize the performance. A set of applications are designed to evaluate the proposed scheme. Experimental results show that the proposed scheme can improve the performance by an average value of 22.2%, compared with the baseline schedule based on the worst case timing analysis. Compared with the conventional task scheduling approach based on the actual maximum clock frequencies, the proposed scheme also improves the performance by up to 12%.展开更多
文摘Cloud computing has taken over the high-performance distributed computing area,and it currently provides on-demand services and resource polling over the web.As a result of constantly changing user service demand,the task scheduling problem has emerged as a critical analytical topic in cloud computing.The primary goal of scheduling tasks is to distribute tasks to available processors to construct the shortest possible schedule without breaching precedence restrictions.Assignments and schedules of tasks substantially influence system operation in a heterogeneous multiprocessor system.The diverse processes inside the heuristic-based task scheduling method will result in varying makespan in the heterogeneous computing system.As a result,an intelligent scheduling algorithm should efficiently determine the priority of every subtask based on the resources necessary to lower the makespan.This research introduced a novel efficient scheduling task method in cloud computing systems based on the cooperation search algorithm to tackle an essential task and schedule a heterogeneous cloud computing problem.The basic idea of thismethod is to use the advantages of meta-heuristic algorithms to get the optimal solution.We assess our algorithm’s performance by running it through three scenarios with varying numbers of tasks.The findings demonstrate that the suggested technique beats existingmethods NewGenetic Algorithm(NGA),Genetic Algorithm(GA),Whale Optimization Algorithm(WOA),Gravitational Search Algorithm(GSA),and Hybrid Heuristic and Genetic(HHG)by 7.9%,2.1%,8.8%,7.7%,3.4%respectively according to makespan.
文摘5G,8K视频等新业务类型不断涌现,使得网络处理器(network processor,NP)的应用场景日趋复杂多样.为满足多样化网络应用在性能、灵活性以及服务质量保证等方面的差异化需求,传统NP试图在片上系统(system on chip,SoC)上集成大量处理器核、高速缓存、加速器等异质处理资源,提供面向多样化应用场景的敏捷可定制能力.然而,随着摩尔定律和登纳德缩放定律失效问题的逐渐凸显,单片NP芯片研制在研发周期、成本、创新迭代等方面面临巨大挑战,越来越难以为继.针对上述问题,提出新型敏捷可定制NP架构ChipletNP,基于芯粒化(Chiplet)技术解耦异质资源,在充分利用成熟芯片产品及工艺的基础上,通过多个芯粒组合,满足不同应用场景下NP的快速定制和演化发展需求.基于ChipletNP设计实现了一款集成商用CPU、FPGA(field programmable gate array)和自研敏捷交换芯粒的银河衡芯敏捷NP芯片(YHHX-NP).基于该芯片的应用部署与实验结果表明,ChipletNP可支持NP的快速敏捷定制,能够有效承载SRv6(segment routing over IPv6)等新型网络协议与网络功能部署.其中,核心的敏捷交换芯粒相较于同级商用芯片能效比提升2倍以上,延迟控制在2.82μs以内,可以有效支持面向NP的Chiplet统一通信与集成.
文摘为了提高异构多核处理器平台的计算性能,从任务调度的角度出发,提出了一种使用黄金正弦和莱维飞行机制改进的麻雀搜索算法(Fusion of Golden Sinusoidal and Levy Flight in Sparrow Search Algorithm,GSLF-SSA)来优化异构多核处理器的任务调度。通过对异构任务调度的分析,将异构任务建模为DAG(Directed Acyclic Graph)任务模型,通过对其优先级进行随机编码分配,实现了GSLF-SSA算法求解域从连续到离散的映射,使该算法更能适用于异构多核任务调度之中。将DAG任务的最优调度长度作为算法的适应度值进行迭代寻优,通过与目前应用广泛的麻雀搜索算法(SSA)、混合式任务调度算法(IHSSA)、人工蜂群算法(ABC)等多种启发式算法在异构任务调度环境下的实验对比表明,GSLF-SSA能获得更优的调度长度与更短的调度执行时间。
文摘Due to advances in semiconductor techniques, many-core processors have been widely used in high performance computing. However, many applications still cannot be carried out efficiently due to the memory wall, which has become a bottleneck in many-core processors. In this paper, we present a novel heterogeneous many-core processor architecture named deeply fused many-core (DFMC) for high performance computing systems. DFMC integrates management processing ele- ments (MPEs) and computing processing elements (CPEs), which are heterogeneous processor cores for different application features with a unified ISA (instruction set architecture), a unified execution model, and share-memory that supports cache coherence. The DFMC processor can alleviate the memory wall problem by combining a series of cooperative computing techniques of CPEs, such as multi-pattern data stream transfer, efficient register-level communication mechanism, and fast hardware synchronization technique. These techniques are able to improve on-chip data reuse and optimize memory access performance. This paper illustrates an implementation of a full system prototype based on FPGA with four MPEs and 256 CPEs. Our experimental results show that the effect of the cooperative computing techniques of CPEs is significant, with DGEMM (double-precision matrix multiplication) achieving an efficiency of 94%, FFT (fast Fourier transform) obtaining a performance of 207 GFLOPS and FDTD (finite-difference time-domain) obtaining a performance of 27 GFLOPS.
基金Project supported by the National Natural Science Foundation of China(Nos.6122500861373074+3 种基金and 61373090)the National Basic Research Program(973)of China(No.2014CB349304)the Specialized Research Fund for the Doctoral Program of Higher Education,the Ministry of Education of China(No.20120002110033)the Tsinghua University Initiative Scientific Research Program
文摘Multi-core homogeneous processors have been widely used to deal with computation-intensive embedded applications. However, with the continuous down scaling of CMOS technology, within-die variations in the manufacturing process lead to a significant spread in the operating speeds of cores within homogeneous multi-core processors. Task scheduling approaches, which do not consider such heterogeneity caused by within-die variations,can lead to an overly pessimistic result in terms of performance. To realize an optimal performance according to the actual maximum clock frequencies at which cores can run, we present a heterogeneity-aware schedule refining(HASR) scheme by fully exploiting the heterogeneities of homogeneous multi-core processors in embedded domains.We analyze and show how the actual maximum frequencies of cores are used to guide the scheduling. In the scheme,representative chip operating points are selected and the corresponding optimal schedules are generated as candidate schedules. During the booting of each chip, according to the actual maximum clock frequencies of cores, one of the candidate schedules is bound to the chip to maximize the performance. A set of applications are designed to evaluate the proposed scheme. Experimental results show that the proposed scheme can improve the performance by an average value of 22.2%, compared with the baseline schedule based on the worst case timing analysis. Compared with the conventional task scheduling approach based on the actual maximum clock frequencies, the proposed scheme also improves the performance by up to 12%.