期刊文献+
共找到143篇文章
< 1 2 8 >
每页显示 20 50 100
Improving the Off-chip Bandwidth Utilization of Chip-Multiprocessors Using Early Write-Back
1
作者 Mutaz A1-Tarawneh NazeihBotros 《通讯和计算机(中英文版)》 2013年第1期33-41,共9页
关键词 带宽利用率 多处理器 早期 芯片 二级高速缓存 需求获取 层次结构 主存储器
下载PDF
基于CMP的多种并行蚁群算法及比较 被引量:3
2
作者 何丽莉 王克淼 +1 位作者 白洪涛 胡成全 《吉林大学学报(理学版)》 CAS CSCD 北大核心 2010年第5期787-792,共6页
基于片上多核处理器(Chip Multi-processor,CMP)的多种并行蚁群算法,包括并行最大最小蚂蚁系统、并行蚁群系统及两者的混合等5个并行算法,提出一种在CMP的每个处理器核心上模拟一个子蚁群,整体蚁群共享同一信息素矩阵,实现信息素隐式交... 基于片上多核处理器(Chip Multi-processor,CMP)的多种并行蚁群算法,包括并行最大最小蚂蚁系统、并行蚁群系统及两者的混合等5个并行算法,提出一种在CMP的每个处理器核心上模拟一个子蚁群,整体蚁群共享同一信息素矩阵,实现信息素隐式交流的方法.用多线程实时优先级实现该算法,并用若干旅行商问题实例进行了测试,分析了不同并行策略的影响.测试结果表明,基于CMP的并行蚁群具有相对于核心数目的线性加速比,异种蚁群混合策略在解的稳定性上更具优势。 展开更多
关键词 蚁群优化 共享信息素矩阵 并行计算 片上多核处理器
下载PDF
一种基于容量复用的异构CMP Cache 被引量:2
3
作者 高翔 章隆兵 胡伟武 《计算机研究与发展》 EI CSCD 北大核心 2008年第5期877-885,共9页
多核环境下的Cache设计技术受到线延时和应用等多方面因素影响,私有和共享方案都存在各自的不足.提出了一种异构的CMP Cache结构,采用两类具有不同Cache层次的结点组成多核芯片,设计了基于间接索引的Cache容量复用等技术,提供了容量有... 多核环境下的Cache设计技术受到线延时和应用等多方面因素影响,私有和共享方案都存在各自的不足.提出了一种异构的CMP Cache结构,采用两类具有不同Cache层次的结点组成多核芯片,设计了基于间接索引的Cache容量复用等技术,提供了容量有效且访问迅速的片上存储层次.在全系统环境下对SPEC CPU2000,SPLASH2等程序的评测结果表明,异构CMP Cache结构能够适应各类应用的需要,对单进程和多线程应用平均性能提高分别可达16%和9%.异构CMP Cache同时具有硬件设计简单的特点,具有较好的工程可实现性,其设计思想将应用在未来的龙芯多核处理器设计中. 展开更多
关键词 片上多核处理器 存储层次 异构 容量复用 高速缓存一致性
下载PDF
氮化镓晶片的CMP技术现状与趋势 被引量:6
4
作者 熊朋 王铮 +2 位作者 陈威 史霄 杨师 《电子工业专用设备》 2016年第1期10-14,共5页
综述了半导体材料氮化镓(Ga N)抛光技术的发展,介绍了Ga N晶片化学机械平坦化(Chemical Mechanical Planarization,CMP)技术的研究现状,分析了CMP的原理和工艺参数对抛光的影响,指出了CMP亟待解决的技术和理论问题,并对其发展方向进行... 综述了半导体材料氮化镓(Ga N)抛光技术的发展,介绍了Ga N晶片化学机械平坦化(Chemical Mechanical Planarization,CMP)技术的研究现状,分析了CMP的原理和工艺参数对抛光的影响,指出了CMP亟待解决的技术和理论问题,并对其发展方向进行了展望。 展开更多
关键词 GaN晶片 化学机械抛光 抛光工艺
下载PDF
CCSim:基于Pin的CMPCache访问模拟器
5
作者 郑启龙 栾俊 +1 位作者 房明 吴晓伟 《微电子学与计算机》 CSCD 北大核心 2008年第10期5-7,11,共4页
随着芯片集成制造工艺的日益发展,拥有多级Cache的片上多处理器(CMP)已成为桌面应用和高端计算的主流平台.为了优化程序在CMP下运行性能,文中以Pin工具软件为基础,提出并设计了一个面向CMP体系架构的多级Cache访问模拟器——CCSim.该模... 随着芯片集成制造工艺的日益发展,拥有多级Cache的片上多处理器(CMP)已成为桌面应用和高端计算的主流平台.为了优化程序在CMP下运行性能,文中以Pin工具软件为基础,提出并设计了一个面向CMP体系架构的多级Cache访问模拟器——CCSim.该模拟器不仅可以模拟同构CMP下传统方式的Cache访问,而且还可以对CMP中最后一级共享Cache的竞争访问以及非传统方式的Barcelona式Cache访问模式进行模拟分析. 展开更多
关键词 PIN Cache模拟器 片上多处理器
下载PDF
基于混合粒子群优化的CMP线程调度方法 被引量:1
6
作者 李静梅 张博 《计算机工程》 CAS CSCD 2012年第20期113-115,共3页
为提高片上多核处理器(CMP)架构中线程调度的执行效率,发挥CMP的并行性能,提出一种基于混合粒子群优化算法的线程调度方法。根据设计的线程调度模型,利用有向无环图表述线程及线程间的相互依赖关系,并采用改进的混合粒子群算法对其进行... 为提高片上多核处理器(CMP)架构中线程调度的执行效率,发挥CMP的并行性能,提出一种基于混合粒子群优化算法的线程调度方法。根据设计的线程调度模型,利用有向无环图表述线程及线程间的相互依赖关系,并采用改进的混合粒子群算法对其进行合理调度。实验结果表明,该方法的执行效率优于现有的遗传算法,能有效地降低任务的执行时间,充分发挥多核架构的优势。 展开更多
关键词 片上多核处理器 线程调度 粒子群优化算法 全局最优 局部最优 有向无环图 调度方法
下载PDF
基于CMP的高密度计算机多目标设计方法 被引量:5
7
作者 刘宇航 祝明发 +1 位作者 崔吉顺 肖利民 《系统工程与电子技术》 EI CSCD 北大核心 2012年第4期806-812,共7页
面向高端应用的高效能计算机一般具有高性能、高集成度、高热密度、高复杂性的特点,其研制是一项复杂的系统工程。每一环节,存在功能、性能、可靠性等相互制约但需同时兼顾的多个目标。在实践中这些方面的权衡设计如何以有序的方式展开... 面向高端应用的高效能计算机一般具有高性能、高集成度、高热密度、高复杂性的特点,其研制是一项复杂的系统工程。每一环节,存在功能、性能、可靠性等相互制约但需同时兼顾的多个目标。在实践中这些方面的权衡设计如何以有序的方式展开,是一个亟待解决的关键问题。提出了可靠性与功能、性能权衡的设计方法,并应用到一款基于国产多核处理器的16路高密度计算机的自主研制中,软件仿真分析和系统实测验证了该权衡设计方法的有效性。 展开更多
关键词 高密度计算机 高能效 多目标 权衡 协同设计 片上多核
下载PDF
CMP中基于目录的协作Cache设计方案 被引量:1
8
作者 赵小雨 吴俊敏 +2 位作者 隋秀峰 王庆波 唐轶轩 《计算机工程》 CAS CSCD 北大核心 2010年第21期283-285,共3页
片上多处理器中二级Cache的设计和管理是影响其性能的关键因素之一。在私有二级Cache的基础上,提出一种基于集中式一致性目录的协作Cache设计方案,通过有效地管理片上存储资源来优化处理器的性能,从而使该协作Cache具有平均访存延迟小、... 片上多处理器中二级Cache的设计和管理是影响其性能的关键因素之一。在私有二级Cache的基础上,提出一种基于集中式一致性目录的协作Cache设计方案,通过有效地管理片上存储资源来优化处理器的性能,从而使该协作Cache具有平均访存延迟小、Cache缺失率低、可扩展性好等优点。实验结果显示,与共享二级Cache设计相比,协作Cache可以将4核处理器的吞吐量平均提高13.5%,而其硬件开销约为8.1%。 展开更多
关键词 协作Cache 集中式一致性目录 片上多处理器 流感知
下载PDF
GaN晶片的CMP加工工艺研究 被引量:5
9
作者 孙强 吴健 李伟 《轻工机械》 CAS 2011年第5期48-51,56,共5页
研究了GaN晶片的化学机械抛光(CMP)工艺,分析了在CMP加工工艺过程中对GaN晶片表面质量所产生的影响。实验采用质量分数30%的H2O2溶液与铁经过Fenton化学反应20 min后作为抛光液,并分别利用2种不同磨粒粒度W5和W0.5对晶片表面抛光,对不... 研究了GaN晶片的化学机械抛光(CMP)工艺,分析了在CMP加工工艺过程中对GaN晶片表面质量所产生的影响。实验采用质量分数30%的H2O2溶液与铁经过Fenton化学反应20 min后作为抛光液,并分别利用2种不同磨粒粒度W5和W0.5对晶片表面抛光,对不同工艺参数加工后的晶片表面进行测试分析,并推测加工过程中晶片表面可能发生的化学反应。 展开更多
关键词 化学机械抛光 GaN晶片 FENTON反应
下载PDF
一种CMP结构上的事务存储编程模型设计 被引量:2
10
作者 陈嘉 安虹 +1 位作者 刘圆 王莉 《计算机仿真》 CSCD 2007年第6期81-85,共5页
多核结构上采用由用户显式制导的并行程序设计模型,使用锁和同步变量来实现同步。事务存储模型能够解决由锁机制带来的一系列问题,提高程序的并发性。介绍了在文中提出的一种基于事务存储模型的多核结构(Transactional-Memory based Chi... 多核结构上采用由用户显式制导的并行程序设计模型,使用锁和同步变量来实现同步。事务存储模型能够解决由锁机制带来的一系列问题,提高程序的并发性。介绍了在文中提出的一种基于事务存储模型的多核结构(Transactional-Memory based Chip Multiple-Superscaler,TMCMS)上的并行编程模型,以及针对循环程序的执行模型;以FFT程序为例具体介绍了循环结构的并行化方法和编译转换过程。在初步的实验中,将处理单元从1增加到16个时,在所设计的编程模型的支持下,IPC(Instruction PerCycle)有接近线性的增长,说明该并行编程模型能够充分发掘程序中潜在的细粒度线程级并行性,同时保持并行程序设计的简单性。 展开更多
关键词 多核芯片结构 并行程序设计模型 事务存储
下载PDF
一种面向CMP的可变相联度混合Cache结构 被引量:1
11
作者 晏沛湘 杨先炬 张民选 《电子学报》 EI CAS CSCD 北大核心 2011年第3期656-659,共4页
以V-Way Cache结构为原型,提出一种面向CMP的可变相联度混合Cache结构CMP-VH.CMP-VH将最后一级片上Cache划分成一种优化的私有/共享结构,Tag私有,数据部分私有部分共享.采用基于数据块的重用信息替换策略,提供显式和隐式两种机制在核间... 以V-Way Cache结构为原型,提出一种面向CMP的可变相联度混合Cache结构CMP-VH.CMP-VH将最后一级片上Cache划分成一种优化的私有/共享结构,Tag私有,数据部分私有部分共享.采用基于数据块的重用信息替换策略,提供显式和隐式两种机制在核间对共享数据进行容量划分.并行程序负载SPLASH-2的模拟实验结果表明,CMP-VH具有比单一的私有/共享结构更好的整体性能. 展开更多
关键词 片上多核处理器 混合Cache结构 Reuse替换策略
下载PDF
Development of FPGA Based NURBS Interpolator and Motion Controller with Multiprocessor Technique 被引量:2
12
作者 ZHAO Huan ZHU Limin +1 位作者 XIONG Zhenhua DING Han 《Chinese Journal of Mechanical Engineering》 SCIE EI CAS CSCD 2013年第5期940-947,共8页
The high-speed computational performance is gained at the cost of huge hardware resource,which restricts the application of high-accuracy algorithms because of the limited hardware cost in practical use.To solve the p... The high-speed computational performance is gained at the cost of huge hardware resource,which restricts the application of high-accuracy algorithms because of the limited hardware cost in practical use.To solve the problem,a novel method for designing the field programmable gate array(FPGA)-based non-uniform rational B-spline(NURBS) interpolator and motion controller,which adopts the embedded multiprocessor technique,is proposed in this study.The hardware and software design for the multiprocessor,one of which is for NURBS interpolation and the other for position servo control,is presented.Performance analysis and experiments on an X-Y table are carried out,hardware cost as well as consuming time for interpolation and motion control is compared with the existing methods.The experimental and comparing results indicate that,compared with the existing methods,the proposed method can reduce the hardware cost by 97.5% using higher-accuracy interpolation algorithm within the period of 0.5 ms.A method which ensures the real-time performance and interpolation accuracy,and reduces the hardware cost significantly is proposed,and it’s practical in the use of industrial application. 展开更多
关键词 NURBS interpolator FPGA-based interpolation multiprocessor system on a programmable chip (SOPC) motion controller
下载PDF
Performance modeling of positive degraded task-pair with helper-thread in CMP
13
作者 Gu Zhimin Zheng Ninghan +3 位作者 Zhang Yi Liu Changding Tang Jie Huang Yan 《High Technology Letters》 EI CAS 2010年第3期221-226,共6页
Helper-thread of a task can hide the memory access time of irregular data on the chip muhi-core processor (CMP). For constructing a compiler that effectively supports the helper-thread of a task in the multi-core sc... Helper-thread of a task can hide the memory access time of irregular data on the chip muhi-core processor (CMP). For constructing a compiler that effectively supports the helper-thread of a task in the multi-core scenario based on the last level shared cache, this paper studies its performance stable condi- tions. Unfortunately, there is no existing model that allows extensive investigation of the impact of stable conditions, we present the base of pre-computation that is formalized by our degraded task-pair 〈 T, T' 〉 with the helper-thread, and its stable conditions are analyzed. Finally, a novel performance model and a constructing method of pre-computation based on our positive degraded task-pair are proposed. The efficient results are shown by our experiments. If we further exploit memory level parallelism (MLP) for our task-pair, the task-pair 〈 T, T' 〉 can reach better performance. 展开更多
关键词 chip multi-core processor (cmp helper-thread pre-computation performance model
下载PDF
Fault-Tolerant Design Techniques in ACMP Architecture
14
作者 YAOWen.bin WANGDong-sheng 《Wuhan University Journal of Natural Sciences》 CAS 2005年第1期5-8,共4页
Single-chip multiprocessor (CMP) combined with the fault-loleranl(FT)techniques offers an ideal architecture to achieve high availability on the basis of sustaining highcomputing performance FT design of a single-chip... Single-chip multiprocessor (CMP) combined with the fault-loleranl(FT)techniques offers an ideal architecture to achieve high availability on the basis of sustaining highcomputing performance FT design of a single-chip multiprocessor is described, including thetechniques from hard-wart redundancy to software support and firmware strategy. The design aims atmasking the influences of errors and automatically correcting the system states. 展开更多
关键词 computer architecture fault-tolerant design single-chip multiprocessor
下载PDF
环连接CMP模拟器:Godson-Ring
15
作者 曹非 《计算机工程与应用》 CSCD 2013年第9期13-18,49,共7页
片上互连结构和cache一致性协议是片上多核处理器(CMP)设计的关键。为了探索使用环形互连结构CMP的cache一致性协议设计空间,需要使用对环形互连结构和cache一致性协议进行精确模拟的CMP模拟器平台。Godson-Ring是一个环连接CMP的用户... 片上互连结构和cache一致性协议是片上多核处理器(CMP)设计的关键。为了探索使用环形互连结构CMP的cache一致性协议设计空间,需要使用对环形互连结构和cache一致性协议进行精确模拟的CMP模拟器平台。Godson-Ring是一个环连接CMP的用户态模拟器平台,采用功能和时序相分离的模拟方式,使用了事件驱动和执行驱动相结合的方法,周期精确地模拟了环形互连结构和cache一致性协议的硬件行为。该模拟器具有速度快和灵活性高的特点,能模拟多种cache一致性协议,可以快速、有效地探索环连接CMP的cache一致性协议设计空间。 展开更多
关键词 片上多核处理器 CACHE一致性协议 模拟器
下载PDF
A Resource-Efficient Communication Architecture for Chip Multiprocessors on FPGAs 被引量:1
16
作者 Maggie Swetha Thota 《Journal of Computer Science & Technology》 SCIE EI CSCD 2011年第3期434-447,共14页
Significant advances in field-programmable gate arrays (FPGAs) have made it viable to explore innovative multiprocessor solutions on a single FPGA chip. For multiprocessors, an efficient communication network that m... Significant advances in field-programmable gate arrays (FPGAs) have made it viable to explore innovative multiprocessor solutions on a single FPGA chip. For multiprocessors, an efficient communication network that matches the needs of the target application is always critical to the overall performance. Wormhole packet-switching network-on-chip (NoC) solutions are replacing conventional shared buses to deal with scalability and complexity challenges coming along with the increasing number of processing elements (PEs). However, the quest for high performance networks has led to very complex and resource-expensive NoC designs, leaving little room for the real computing force, i.e., PEs. Moreover, many techniques offer very small performance gains or none at all when network traffic is light while increasing the resource usage of routers. We argue that computation is still the primary task of multiprocessors and sufficient resources should be reserved for PEs. This paper presents our novel design and implementation of a resource-efficient communication network for multiprocessors on FPGAs. We reduce not only the required number of routers for a given number of PEs by introducing a new PE-router topology, but also the resource requirement of each router. Our communication network relies on the NEWS channels to transfer packets in a pipelined fashion following the path determined by the routing network, The implementation results on various Xilinx FPGAs show good performance in the typical range of network load for multiprocessor applications. 展开更多
关键词 chip multiprocessors FPGA network on chip mesh topology resource efficient
原文传递
CCNoC:Cache-Coherent Network on Chip for Chip Multiprocessors 被引量:1
17
作者 王惊雷 薛一波 +4 位作者 Member, CCF, IEEE 王海霞 李崇民 汪东升 Senior Member,CCF 《Journal of Computer Science & Technology》 SCIE EI CSCD 2010年第2期257-266,共10页
As the number of cores in chip multiprocessors (CMPs) increases, cache coherence protocol has become a key issue in integration of chip multiprocessors. Supporting cache coherence protocol in large chip multiprocess... As the number of cores in chip multiprocessors (CMPs) increases, cache coherence protocol has become a key issue in integration of chip multiprocessors. Supporting cache coherence protocol in large chip multiprocessors still faces three hurdles: design complexity, performance and scalability. This paper proposes Cache Coherent Network on Chip (CCNoC), a scheme that decouples cache coherency maintenance from processors and shared L2 caches and implements it completely in network on chip to free up processors and shared L2 caches from the chore of maintaining coherency, thereby reduces design complexity of CMPs. In this way, CCNoC also improves the performance of cache coherence protocol through reducing directory access latency and enhances scalability by avoiding massive directories overhead in shared L2 caches. In CCNoC, coherence state caches and active directory caches are implemented in the network interface components of network on chip to maintain cache coherence states for blocks in L1 caches and manage directory information for recently accessed blocks in L2 caches respectively. CCNoC provides a scalable CMP framework to tackle cache coherency which is the foundation of CMP. This paper evaluates the performance of CCNoC. Experimental results show that for a 16-core system, CCNoC improves performance by 3% on average over the conventional chip multiprocessor and by 10% at best, while reduces storage overhead by 1.8% and saves directory storage by 88%, showing good scalability. 展开更多
关键词 chip multiprocessor network on chip cache coherence protocol
原文传递
Energy-Efficient Scheduling Based on Task Migration Policy Using DPM for Homogeneous MPSoCs
18
作者 Hamayun Khan Irfan Ud din +1 位作者 Arshad Ali Sami Alshmrany 《Computers, Materials & Continua》 SCIE EI 2023年第1期965-981,共17页
Increasing the life span and efficiency of Multiprocessor System on Chip(MPSoC)by reducing power and energy utilization has become a critical chip design challenge for multiprocessor systems.With the advancement of te... Increasing the life span and efficiency of Multiprocessor System on Chip(MPSoC)by reducing power and energy utilization has become a critical chip design challenge for multiprocessor systems.With the advancement of technology,the performance management of central processing unit(CPU)is changing.Power densities and thermal effects are quickly increasing in multi-core embedded technologies due to shrinking of chip size.When energy consumption reaches a threshold that creates a delay in complementary metal oxide semiconductor(CMOS)circuits and reduces the speed by 10%–15%because excessive on-chip temperature shortens the chip’s life cycle.In this paper,we address the scheduling&energy utilization problem by introducing and evaluating an optimal energy-aware earliest deadline first scheduling(EA-EDF)based technique formultiprocessor environments with task migration that enhances the performance and efficiency in multiprocessor systemon-chip while lowering energy and power consumption.The selection of core andmigration of tasks prevents the system from reaching itsmaximumenergy utilization while effectively using the dynamic power management(DPM)policy.Increase in the execution of tasks the temperature and utilization factor(u_(i))on-chip increases that dissipate more power.The proposed approach migrates such tasks to the core that produces less heat and consumes less power by distributing the load on other cores to lower the temperature and optimizes the duration of idle and sleep times across multiple CPUs.The performance of the EA-EDF algorithm was evaluated by an extensive set of experiments,where excellent results were reported when compared to other current techniques,the efficacy of the proposed methodology reduces the power and energy consumption by 4.3%–4.7%on a utilization of 6%,36%&46%at 520&624 MHz operating frequency when particularly in comparison to other energy-aware methods for MPSoCs.Tasks are running and accurately scheduled to make an energy-efficient processor by controlling and managing the thermal effects on-chip and optimizing the energy consumption of MPSoCs. 展开更多
关键词 Dynamic power management dynamic voltage&frequency scaling dynamic thermal management multiprocessor system on chip complementary metal oxide semiconductor reliability
下载PDF
An Optimal DPM Based Energy-Aware Task Scheduling for Performance Enhancement in Embedded MPSoC
19
作者 Hamayun Khan Irfan Ud Din +1 位作者 Arshad Ali Mohammad Husain 《Computers, Materials & Continua》 SCIE EI 2023年第1期2097-2113,共17页
Minimizing the energy consumption to increase the life span and performance of multiprocessor system on chip(MPSoC)has become an integral chip design issue for multiprocessor systems.The performance measurement of com... Minimizing the energy consumption to increase the life span and performance of multiprocessor system on chip(MPSoC)has become an integral chip design issue for multiprocessor systems.The performance measurement of computational systems is changing with the advancement in technology.Due to shrinking and smaller chip size power densities onchip are increasing rapidly that increasing chip temperature in multi-core embedded technologies.The operating speed of the device decreases when power consumption reaches a threshold that causes a delay in complementary metal oxide semiconductor(CMOS)circuits because high on-chip temperature adversely affects the life span of the chip.In this paper an energy-aware dynamic power management technique based on energy aware earliest deadline first(EA-EDF)scheduling is proposed for improving the performance and reliability by reducing energy and power consumption in the system on chip(SOC).Dynamic power management(DPM)enables MPSOC to reduce power and energy consumption by adopting a suitable core configuration for task migration.Task migration avoids peak temperature values in the multicore system.High utilization factor(ui)on central processing unit(CPU)core consumes more energy and increases the temperature on-chip.Our technique switches the core bymigrating such task to a core that has less temperature and is in a low power state.The proposed EA-EDF scheduling technique migrates load on different cores to attain stability in temperature among multiple cores of the CPU and optimized the duration of the idle and sleep periods to enable the low-temperature core.The effectiveness of the EA-EDF approach reduces the utilization and energy consumption compared to other existing methods and works.The simulation results show the improvement in performance by optimizing 4.8%on u_(i) 9%,16%,23%and 25%at 520 MHz operating frequency as compared to other energy-aware techniques for MPSoCs when the least number of tasks is in running state and can schedule more tasks to make an energy-efficient processor by controlling and managing the energy consumption of MPSoC. 展开更多
关键词 Dynamic power management dynamic voltage&frequency scaling dynamic thermal management multiprocessor system on chip complementary metal oxide semiconductor reliability
下载PDF
一种分片式多核处理器的用户级模拟器 被引量:6
20
作者 黄琨 马可 +2 位作者 曾洪博 张戈 章隆兵 《软件学报》 EI CSCD 北大核心 2008年第4期1069-1080,共12页
随着片上晶体管资源的增多和互连线延迟的加大,分片式多核微处理器已成为多核处理器设计的新方向.为了对这种新型处理器进行体系结构的深入研究和设计空间的探索,设计并实现了针对分片式多核处理器的用户级多核性能模拟器.该多核模拟器... 随着片上晶体管资源的增多和互连线延迟的加大,分片式多核微处理器已成为多核处理器设计的新方向.为了对这种新型处理器进行体系结构的深入研究和设计空间的探索,设计并实现了针对分片式多核处理器的用户级多核性能模拟器.该多核模拟器在龙芯2号单处理器核的基础上,完整地模拟了基于目录的Cache一致性协议和存储转发式片上互联网络的结构模型,详细地刻画了由于系统乱序处理各种请求应答和请求之间的冲突而造成的时序特性,可以通过运行各种串行或并行的工作负载对多核处理器的各种重要性能指标加以评估,为多核处理器的结构设计提供了快速、灵活、高效的研究平台. 展开更多
关键词 分片式cmp(chip multiprocessor) 模拟器 片上网络 性能分析 龙芯2号微处理器
下载PDF
上一页 1 2 8 下一页 到第
使用帮助 返回顶部