期刊文献+
共找到17篇文章
< 1 >
每页显示 20 50 100
Multi-core optimization for conjugate gradient benchmark on heterogeneous processors
1
作者 邓林 窦勇 《Journal of Central South University》 SCIE EI CAS 2011年第2期490-498,共9页
Developing parallel applications on heterogeneous processors is facing the challenges of 'memory wall',due to limited capacity of local storage,limited bandwidth and long latency for memory access. Aiming at t... Developing parallel applications on heterogeneous processors is facing the challenges of 'memory wall',due to limited capacity of local storage,limited bandwidth and long latency for memory access. Aiming at this problem,a parallelization approach was proposed with six memory optimization schemes for CG,four schemes of them aiming at all kinds of sparse matrix-vector multiplication (SPMV) operation. Conducted on IBM QS20,the parallelization approach can reach up to 21 and 133 times speedups with size A and B,respectively,compared with single power processor element. Finally,the conclusion is drawn that the peak bandwidth of memory access on Cell BE can be obtained in SPMV,simple computation is more efficient on heterogeneous processors and loop-unrolling can hide local storage access latency while executing scalar operation on SIMD cores. 展开更多
关键词 multi-core processor NAS parallelization CG memory optimization
下载PDF
Parallel Processing Design for LTE PUSCH Demodulation and Decoding Based on Multi-Core Processor
2
作者 Zhang Ziran,Li Jun,Li Changxiao(ZTE Corporation,Shenzhen 518057,P.R.China) 《ZTE Communications》 2009年第1期54-58,共5页
The Long Term Evolution (LTE) system imposes high requirements for dispatching delay.Moreover,very large air interface rate of LTE requires good processing capability for the devices processing the baseband signals.Co... The Long Term Evolution (LTE) system imposes high requirements for dispatching delay.Moreover,very large air interface rate of LTE requires good processing capability for the devices processing the baseband signals.Consequently,the single-core processor cannot meet the requirements of LTE system.This paper analyzes how to use multi-core processors to achieve parallel processing of uplink demodulation and decoding in LTE systems and designs an approach to parallel processing.The test results prove that this approach works quite well. 展开更多
关键词 CORE LTE Parallel Processing Design for LTE PUSCH Demodulation and Decoding Based on multi-core processor Design
下载PDF
Shared Cache Based on Content Addressable Memory in a Multi-Core Architecture
3
作者 Allam Abumwais Mahmoud Obaid 《Computers, Materials & Continua》 SCIE EI 2023年第3期4951-4963,共13页
Modern shared-memory multi-core processors typically have shared Level 2(L2)or Level 3(L3)caches.Cache bottlenecks and replacement strategies are the main problems of such architectures,where multiple cores try to acc... Modern shared-memory multi-core processors typically have shared Level 2(L2)or Level 3(L3)caches.Cache bottlenecks and replacement strategies are the main problems of such architectures,where multiple cores try to access the shared cache simultaneously.The main problem in improving memory performance is the shared cache architecture and cache replacement.This paper documents the implementation of a Dual-Port Content Addressable Memory(DPCAM)and a modified Near-Far Access Replacement Algorithm(NFRA),which was previously proposed as a shared L2 cache layer in a multi-core processor.Standard Performance Evaluation Corporation(SPEC)Central Processing Unit(CPU)2006 benchmark workloads are used to evaluate the benefit of the shared L2 cache layer.Results show improved performance of the multicore processor’s DPCAM and NFRA algorithms,corresponding to a higher number of concurrent accesses to shared memory.The new architecture significantly increases system throughput and records performance improvements of up to 8.7%on various types of SPEC 2006 benchmarks.The miss rate is also improved by about 13%,with some exceptions in the sphinx3 and bzip2 benchmarks.These results could open a new window for solving the long-standing problems with shared cache in multi-core processors. 展开更多
关键词 multi-core processor shared cache content addressable memory dual port CAM replacement algorithm benchmark program
下载PDF
System Architecture of Godson-3 Multi-Core Processors 被引量:7
4
作者 高翔 陈云霁 +2 位作者 王焕东 唐丹 胡伟武 《Journal of Computer Science & Technology》 SCIE EI CSCD 2010年第2期181-191,共11页
Godson-3 is the latest generation of Godson microprocessor family. It takes a scalable multi-core architecture with hardware support for accelerating applications including X86 emulation and signal processing. This pa... Godson-3 is the latest generation of Godson microprocessor family. It takes a scalable multi-core architecture with hardware support for accelerating applications including X86 emulation and signal processing. This paper introduces the system architecture of Godson-3 from various aspects including system scalability, organization of memory hierarchy, network-on-chip, inter-chip connection and I/O subsystem. 展开更多
关键词 multi-core processor scalable interconnection cache coherent non-uniform memory access/non-uniform cache access (CC-NUMA/NUCA) MESH CROSSBAR cache coherence reliability availability and serviceability (RAS)
原文传递
Parallel computing of discrete element method on multi-core processors 被引量:6
5
作者 Yusuke Shigeto Mikio Sakai 《Particuology》 SCIE EI CAS CSCD 2011年第4期398-405,共8页
This paper describes parallel simulation techniques for the discrete element method (DEM) on multi-core processors. Recently, multi-core CPU and GPU processors have attracted much attention in accelerating computer ... This paper describes parallel simulation techniques for the discrete element method (DEM) on multi-core processors. Recently, multi-core CPU and GPU processors have attracted much attention in accelerating computer simulations in various fields. We propose a new algorithm for multi-thread parallel computation of DEM, which makes effective use of the available memory and accelerates the computation. This study shows that memory usage is drastically reduced by using this algorithm. To show the practical use of DEM in industry, a large-scale powder system is simulated with a complicated drive unit. We compared the performance of the simulation between the latest GPU and CPU processors with optimized programs for each processor. The results show that the difference in performance is not substantial when using either GPUs or CPUs with a multi-thread parallel algorithm. In addition, DEM algorithm is shown to have high scalabilitv in a multi-thread parallel computation on a CPU. 展开更多
关键词 Discrete element method Parallel computing multi-core processor GPGPU
原文传递
Schedule refinement for homogeneous multi-core processors in the presence of manufacturing-caused heterogeneity
6
作者 Zhi-xiang CHEN Zhao-lin LI +2 位作者 Shan CAO Fang WANG Jie ZHOU 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2015年第12期1018-1033,共16页
Multi-core homogeneous processors have been widely used to deal with computation-intensive embedded applications. However, with the continuous down scaling of CMOS technology, within-die variations in the manufacturin... Multi-core homogeneous processors have been widely used to deal with computation-intensive embedded applications. However, with the continuous down scaling of CMOS technology, within-die variations in the manufacturing process lead to a significant spread in the operating speeds of cores within homogeneous multi-core processors. Task scheduling approaches, which do not consider such heterogeneity caused by within-die variations,can lead to an overly pessimistic result in terms of performance. To realize an optimal performance according to the actual maximum clock frequencies at which cores can run, we present a heterogeneity-aware schedule refining(HASR) scheme by fully exploiting the heterogeneities of homogeneous multi-core processors in embedded domains.We analyze and show how the actual maximum frequencies of cores are used to guide the scheduling. In the scheme,representative chip operating points are selected and the corresponding optimal schedules are generated as candidate schedules. During the booting of each chip, according to the actual maximum clock frequencies of cores, one of the candidate schedules is bound to the chip to maximize the performance. A set of applications are designed to evaluate the proposed scheme. Experimental results show that the proposed scheme can improve the performance by an average value of 22.2%, compared with the baseline schedule based on the worst case timing analysis. Compared with the conventional task scheduling approach based on the actual maximum clock frequencies, the proposed scheme also improves the performance by up to 12%. 展开更多
关键词 Schedule refining multi-core processor HETEROGENEITY Representative chip operating point
原文传递
Energy Efficiency of a Multi-Core Processor by Tag Reduction
7
作者 郑龙 董冕雄 +3 位作者 Kaoru Ota 金海 Song Guo 马俊 《Journal of Computer Science & Technology》 SCIE EI CSCD 2011年第3期491-503,共13页
We consider the energy saving problem for caches on a multi-core processor. In the previous research on low power processors, there are various methods to reduce power dissipation. Tag reduction is one of them. This p... We consider the energy saving problem for caches on a multi-core processor. In the previous research on low power processors, there are various methods to reduce power dissipation. Tag reduction is one of them. This paper extends the tag reduction technique on a single-core processor to a multi-core processor and investigates the potential of energy saving for multi-core processors. We formulate our approach as an equivalent problem which is to find an assignment of the whole instruction pages in the physical memory to a set of cores such that the tag-reduction conflicts for each core can be mostly avoided or reduced. We then propose three algorithms using different heuristics for this assignment problem. We provide convincing experimental results by collecting experimental data from a real operating system instead of the traditional way using a processor simulator that cannot simulate operating system functions and the full memory hierarchy. Experimental results show that our proposed algorithms can save total energy up to 83.93% on an 8-core processor and 76.16% on a 4-core processor in average compared to the one that the tag-reduction is not used for. They also significantly outperform the tag reduction based algorithm on a single-core processor. 展开更多
关键词 tag reduction multi-core processor energy efficiency
原文传递
非安全数字移动通信标准网络监听平台的设计与数字处理器实现 被引量:3
8
作者 邱烈义 田增山 涂正伟 《科学技术与工程》 北大核心 2014年第8期200-204,209,共6页
根据GSM网络的架构以及空中接口信号的特点,参考当前的科研成果设计了一种利用空中接口直接静默监听的方案,分析了该方案的实现原理、硬件平台模块组成、接口设计、以及基带算法流程。最后通过DSP硬件平台仿真该算法流程,并根据算法解... 根据GSM网络的架构以及空中接口信号的特点,参考当前的科研成果设计了一种利用空中接口直接静默监听的方案,分析了该方案的实现原理、硬件平台模块组成、接口设计、以及基带算法流程。最后通过DSP硬件平台仿真该算法流程,并根据算法解析的部分消息验证了该方案的可行性。该平台能实现语音以及短消息的监听。 展开更多
关键词 关键词数字移动通信标准(global system for mobile communication GSM)网络 监听 硬件平台 接口设计 数字处理器(bigithal signal processor DSP)基带算法
下载PDF
Communication contention in APN list scheduling algorithm 被引量:5
9
作者 TANG XiaoYong LI KenLi PADUA Divid 《Science in China(Series F)》 2009年第1期59-69,共11页
Task scheduling is an essential aspect of parallel process system. This NP-hard problem assumes fully connected homogeneous processors and ignores contention on the communication links. However, as arbitrary processor... Task scheduling is an essential aspect of parallel process system. This NP-hard problem assumes fully connected homogeneous processors and ignores contention on the communication links. However, as arbitrary processor network (APN), communication contention has a strong influence on the execution time of a parallel application. This paper investigates the incorporation of contention awareness into task scheduling. The innovation is the idea of dynamically scheduling edges to links, for which we use the earliest finish communication time search algorithm based on shortest-path search method. The other novel idea proposed in this paper is scheduling priority based on recursive rank computation on heterogeneous arbitrary processor network. In the end, to reduce time complexity of algorithm, a parallel algorithm is proposed and speedup O(PPE) is achieved. The comparison study, based on both randomly generated graphs and the graphs of some real applications, shows that our scheduling algorithm significantly surpasses classic and static communication contention awareness algorithm, especially for high data transmission rate parallel application. 展开更多
关键词 list scheduling arbitrary processor network DAG communication contention parallel algorithm
原文传递
OpenMDSP:Extending OpenMP to Program Multi-Core DSPs 被引量:1
10
作者 何江舟 陈文光 +3 位作者 陈光日 郑纬民 汤志忠 叶寒栋 《Journal of Computer Science & Technology》 SCIE EI CSCD 2014年第2期316-331,共16页
Multi-core digital signal processors (DSPs) are widely used in wireless telecommunication, core network transcoding, industrial control, and audio/video processing technologies, among others. In comparison with gene... Multi-core digital signal processors (DSPs) are widely used in wireless telecommunication, core network transcoding, industrial control, and audio/video processing technologies, among others. In comparison with general-purpose multi-processors, multi-core DSPs normally have a more complex memory hierarchy, such as on-chip core-local memory and non-cache-coherent shared memory. As a result, efficient multi-core DSP applications are very difficult to write. The current approach used to program multi-core DSPs is based on proprietary vendor software development kits (SDKs), which only provide low-level, non-portable primitives. While it is acceptable to write coarse-grained task-level parallel code with these SDKs, writing fine-grained data parallel code with SDKs is a very tedious and error-prone approach. We believe that it is desirable to possess a high-level and portable parallel programming model for multi-core DSPs. In this paper, we propose OpenMDSP, an extension of OpenMP designed for multi-core DSPs. The goal of OpenMDSP is to fill the gap between the OpenMP memory model and the memory hierarchy of multi-core DSPs. We propose three classes of directives in OpenMDSP, including 1) data placement directives that allow programmers to control the placement of global variables conveniently, 2) distributed array directives that divide a whole array into sections and promote the sections into core-local memory to improve performance, and 3) stream access directives that promote big arrays into core-local memory section by section during parallel loop processing while hiding the latency of data movement by the direct memory access (DMA) of a DSP. We implement the compiler and runtime system for OpenMDSP on PreeScale MSC8156. The benchmarking results show that seven of nine benchmarks achieve a speedup of more than a factor of 5 when using six threads. 展开更多
关键词 OPENMP multi-core digital signal processor data parallelism Long Term Evolution
原文传递
YHFT-QDSP:High-Performance Heterogeneous Multi-Core DSP
11
作者 陈书明 万江华 +8 位作者 鲁建壮 刘仲 孙海燕 孙永节 刘衡竹 刘祥远 李振涛 徐毅 陈小文 《Journal of Computer Science & Technology》 SCIE EI CSCD 2010年第2期214-224,共11页
Multi-core architectures are widely used to in time-to-market and power consumption of the chips enhance the microprocessor performance within a limited increase Toward the application of high-density data signal pro... Multi-core architectures are widely used to in time-to-market and power consumption of the chips enhance the microprocessor performance within a limited increase Toward the application of high-density data signal processing, this paper presents a novel heterogeneous multi-core architecture digital signal processor (DSP), YHFT-QDSP, with one RISC CPU core and 4 VLIW DSP cores. By three kinds of interconnection, YHFT-QDSP provides high efficiency message communication for inner-chip RISC core and DSP cores, inner-chip and inter-chip DSP cores. A parallel programming platform is specifically developed for the heterogeneous nmlti-core architecture of YHFT-QDSP. This parallel programming environment provides a parallel support library and a friendly interface between high level application softwares and multi- core DSP. The 130 nm CMOS custom chip design results benchmarks show that the interconnection structure of in a high speed and moderate power design. The results of typical YHFT-QDSP is much better than other related structures and achieves better speedup when using the interconnection facilities in combing methods. YHFT-QDSP has been signed off and manufactured presently. The future applications of the multi-core chip could be found in 3G wireless base station, high performance radar, industrial applications, and so on. 展开更多
关键词 digital signal processor (DSP) multi-core ARCHITECTURE parallel programming custom design
原文传递
Performance modeling of positive degraded task-pair with helper-thread in CMP
12
作者 Gu Zhimin Zheng Ninghan +3 位作者 Zhang Yi Liu Changding Tang Jie Huang Yan 《High Technology Letters》 EI CAS 2010年第3期221-226,共6页
Helper-thread of a task can hide the memory access time of irregular data on the chip muhi-core processor (CMP). For constructing a compiler that effectively supports the helper-thread of a task in the multi-core sc... Helper-thread of a task can hide the memory access time of irregular data on the chip muhi-core processor (CMP). For constructing a compiler that effectively supports the helper-thread of a task in the multi-core scenario based on the last level shared cache, this paper studies its performance stable condi- tions. Unfortunately, there is no existing model that allows extensive investigation of the impact of stable conditions, we present the base of pre-computation that is formalized by our degraded task-pair 〈 T, T' 〉 with the helper-thread, and its stable conditions are analyzed. Finally, a novel performance model and a constructing method of pre-computation based on our positive degraded task-pair are proposed. The efficient results are shown by our experiments. If we further exploit memory level parallelism (MLP) for our task-pair, the task-pair 〈 T, T' 〉 can reach better performance. 展开更多
关键词 chip multi-core processor (CMP) helper-thread pre-computation performance model
下载PDF
Mobile Positioning System Based on the Wireless Sensor Network in Buildings
13
作者 Xiujun LI Gang SUN Xu WANG 《Communications and Network》 2009年第2期96-100,共5页
Established on the Intel Multi-Core Embedded platform, using 802.11 Wireless Network protocols as the communication medium, combining with Radio Frequency-Communication and Ultrasonic Ranging, imple-ment a mobile term... Established on the Intel Multi-Core Embedded platform, using 802.11 Wireless Network protocols as the communication medium, combining with Radio Frequency-Communication and Ultrasonic Ranging, imple-ment a mobile terminal system in an intellectualized building. It can provide its holder such functions: 1) Accurate Positioning 2) Intelligent Navigation 3) Video Monitoring 4) Wireless Communication. The inno-vative point for this paper is to apply the multi-core computing on the embedded system to promote its com-puting speed and give a real-time performance and apply this system into the indoor environment for the purpose of emergent event or rescuing. 展开更多
关键词 POSITIONING Intelligent NAVIGATION VIDEO Transmission Wireless communication Sensor Networks multi-core COMPUTING
下载PDF
A Novel Device for Real-Time Monitoring of High Frequency Phenomena in CENELEC PLC Band
14
作者 Bashir Ahmed Siddiqui Pertti Pakonen Pekka Verho 《Smart Grid and Renewable Energy》 2012年第2期152-157,共6页
This paper proposes the design and development of a novel, portable and low-cost intelligent electronic device (IED) for real-time monitoring of high frequency phenomena in CENELEC PLC band. A high speed floating-poin... This paper proposes the design and development of a novel, portable and low-cost intelligent electronic device (IED) for real-time monitoring of high frequency phenomena in CENELEC PLC band. A high speed floating-point digital signal processor (DSP) along with 4 MSPS analog-to-digital converter (ADC) is used to develop the intelligent electronic device. An optimized algorithm to process the analog signal in real-time and to extract the meaningful result using signal processing techniques has been implemented on the device. A laboratory environment has setup with all the necessary equipment including the development of the load model to evaluate the performance of the IED. Smart meter and concentrator is also connected to the low voltage (LV) network to monitor the PLC communication using the IED. The device has been tested in the laboratory and it has produced very promising results for time domain as well as frequency domain analysis. Those results imply that the IED is fully capable of monitoring high frequency disturbances in CENELEC PLC band. 展开更多
关键词 Power Line communication (PLC) DIGITAL Signal processor (DSP) Analog-to-Digital Converter (ADC) Fast FOURIER Transform (FFT) High Frequency (HF) Interference
下载PDF
Cooperative Computing Techniques for a Deeply Fused and Heterogeneous Many-Core Processor Architecture 被引量:13
15
作者 郑方 李宏亮 +3 位作者 吕晖 过锋 许晓红 谢向辉 《Journal of Computer Science & Technology》 SCIE EI CSCD 2015年第1期145-162,共18页
Due to advances in semiconductor techniques, many-core processors have been widely used in high performance computing. However, many applications still cannot be carried out efficiently due to the memory wall, which h... Due to advances in semiconductor techniques, many-core processors have been widely used in high performance computing. However, many applications still cannot be carried out efficiently due to the memory wall, which has become a bottleneck in many-core processors. In this paper, we present a novel heterogeneous many-core processor architecture named deeply fused many-core (DFMC) for high performance computing systems. DFMC integrates management processing ele- ments (MPEs) and computing processing elements (CPEs), which are heterogeneous processor cores for different application features with a unified ISA (instruction set architecture), a unified execution model, and share-memory that supports cache coherence. The DFMC processor can alleviate the memory wall problem by combining a series of cooperative computing techniques of CPEs, such as multi-pattern data stream transfer, efficient register-level communication mechanism, and fast hardware synchronization technique. These techniques are able to improve on-chip data reuse and optimize memory access performance. This paper illustrates an implementation of a full system prototype based on FPGA with four MPEs and 256 CPEs. Our experimental results show that the effect of the cooperative computing techniques of CPEs is significant, with DGEMM (double-precision matrix multiplication) achieving an efficiency of 94%, FFT (fast Fourier transform) obtaining a performance of 207 GFLOPS and FDTD (finite-difference time-domain) obtaining a performance of 27 GFLOPS. 展开更多
关键词 heterogeneous many-core processor data stream transfer register-level communication mechanism hardwaresynchronization technique processor prototype
原文传递
An experiment of PMD compensation in 40-Gb/s PSBT transmission system 被引量:1
16
作者 田凤 席丽霞 +3 位作者 张晓光 翁轩 张光勇 熊前进 《Chinese Optics Letters》 SCIE EI CAS CSCD 2010年第9期816-818,共3页
An adaptive polarization mode dispersion (PMD) compensation experiment is reported in a 40-Gb/s phase shaped binary transmission (PSBT) communication system, with the use of a new digital signal processor (DSP)-... An adaptive polarization mode dispersion (PMD) compensation experiment is reported in a 40-Gb/s phase shaped binary transmission (PSBT) communication system, with the use of a new digital signal processor (DSP)-based optical PMD compensator. PMD tolerance is found to be enhanced by 8 ps after PMD compensation with 1-dB optical signal-to-noise ratio (OSNR) penalty. Under the condition of fast change of states of polarization up to 85 rad/s in the fiber link, the performance of our PMD compensator undergoes the bit error ratio (BER) test for as long as 10 h. 展开更多
关键词 communication systems Digital signal processors Fiber optics Optical communication Optical systems POLARIZATION Signal processing Signal to noise ratio
原文传递
Space time clock:An on-chip clock with 10^(-12) instability
17
作者 Zhendong XU Yingchun ZHANG +3 位作者 Pengfei LI Yongsheng WANG Limin DONG Guodong XU 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2022年第10期247-253,共7页
High precision and stable clock is extremely important in communication and navigation.The miniaturization of the clocks is considered to be the trend to satisfy the demand for5G and the next generation communications... High precision and stable clock is extremely important in communication and navigation.The miniaturization of the clocks is considered to be the trend to satisfy the demand for5G and the next generation communications.Based on the concept of meter bar and the principle of the constancy of light velocity,we designed a micro clock,Space Time Clock(STC),with the size smaller than 1 mm×1 mm and the power dissipation less than 2 m W.Designed in integrated circuit of 0.18μm technology,the instability of STC is assessed to be 2.23×10^(-12)and the trend of the instability is reversely proportional toτ.With the potential ability to reach the level of 10instability on chip in the future,the period of the STC’s signal is locked on the delay time defined by the meter bar which keeps the time reference constant.Because of its superior performance,the STC is more suitable for mobile communication,PNT(Positioning,Navigation and Timing),embedded processor and deep space application,and becomes the main payload of the ASRTU satellite scheduled to launch next year and investigate in space environment. 展开更多
关键词 communication Embedded processor INSTABILITY Positioning navigation and timing Space time clock
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部