期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
A TSE based design for MMSE and QRD of MIMO systems based on ASIP
1
作者 冯雪林 SHI Jinglin +3 位作者 CHEN Yang FU Yanlu ZHANG Qineng XIAO Feng 《High Technology Letters》 EI CAS 2023年第2期166-173,共8页
A Taylor series expansion(TSE) based design for minimum mean-square error(MMSE) and QR decomposition(QRD) of multi-input and multi-output(MIMO) systems is proposed based on application specific instruction set process... A Taylor series expansion(TSE) based design for minimum mean-square error(MMSE) and QR decomposition(QRD) of multi-input and multi-output(MIMO) systems is proposed based on application specific instruction set processor(ASIP), which uses TSE algorithm instead of resource-consuming reciprocal and reciprocal square root(RSR) operations.The aim is to give a high performance implementation for MMSE and QRD in one programmable platform simultaneously.Furthermore, instruction set architecture(ISA) and the allocation of data paths in single instruction multiple data-very long instruction word(SIMD-VLIW) architecture are provided, offering more data parallelism and instruction parallelism for different dimension matrices and operation types.Meanwhile, multiple level numerical precision can be achieved with flexible table size and expansion order in TSE ISA.The ASIP has been implemented to a 28 nm CMOS process and frequency reaches 800 MHz.Experimental results show that the proposed design provides perfect numerical precision within the fixed bit-width of the ASIP, higher matrix processing rate better than the requirements of 5G system and more rate-area efficiency comparable with ASIC implementations. 展开更多
关键词 multi-input and multi-output(MIMO) minimum mean-square error(MMSE) QR decomposition(QRD) Taylor series expansion(TSE) application specific instruction set processor(ASIP) instruction set architecture(ISA) single instruction multiple data(SIMD) very long instruction word(VLIW)
下载PDF
A General-Purpose Many-Accelerator Architecture Based on Dataflow Graph Clustering of Applications
2
作者 陈鹏 张磊 +1 位作者 韩银和 陈云霁 《Journal of Computer Science & Technology》 SCIE EI CSCD 2014年第2期239-246,共8页
The combination of growing transistor counts and limited power budget within a silicon die leads to the utilization wall problem (a.k.a. "Dark Silicon"), that is only a small fraction of chip can run at full speed... The combination of growing transistor counts and limited power budget within a silicon die leads to the utilization wall problem (a.k.a. "Dark Silicon"), that is only a small fraction of chip can run at full speed during a period of time. Designing accelerators for specific applications or algorithms is considered to be one of the most promising approaches to improving energy-efficiency. However, most current design methods for accelerators are dedicated for certain applications or algorithms, which greatly constrains their applicability. In this paper, we propose a novel general-purpose many-accelerator architecture. Our contributions are two-fold. Firstly, we propose to cluster dataflow graphs (DFGs) of hotspot basic blocks (BBs) in applications. The DFG clusters are then used for accelerators design. This is because a DFC is the largest program unit which is not specific to a certain application. We analyze 17 benchmarks in SPEC CPU 2006, acquire over 300 DFGs hotspots by using LLVM compiler tool, and divide them into 15 clusters based on graph similarity. Secondly, we introduce a function instruction set architecture (FISC) and illustrate how DFG accelerators can be integrated with a processor core and how they can be used by applications. Our results show that the proposed DFG clustering and FISC design can speed up SPEC benchmarks 6.2X on average. 展开更多
关键词 dataflow graph many-accelerator CLUSTERING function instruction set architecture
原文传递
An Energy-Efficient Instruction Scheduler Design with Two-Level Shelving and Adaptive Banking 被引量:3
3
作者 赵雨来 李险峰 +1 位作者 佟冬 程旭 《Journal of Computer Science & Technology》 SCIE EI CSCD 2007年第1期15-24,共10页
Mainstream processors implement the instruction scheduler using a monolithic CAM-based issue queue (IQ), which consumes increasingly high energy as its size scales. In particular, its instruction wakeup logic accoun... Mainstream processors implement the instruction scheduler using a monolithic CAM-based issue queue (IQ), which consumes increasingly high energy as its size scales. In particular, its instruction wakeup logic accounts for a major portion of the consumed energy. Our study shows that instructions with 2 non-ready operands (called 2OP instructions) are in small percentage, but tend to spend long latencies in the IQ. They can be effectively shelved in a small RAM-based waiting instruction buffer (WIB) and steered into the IQ at appropriate time. With this two-level shelving ability, half of the CAM tag comparators are eliminated in the IQ, which significantly reduces the energy of wakeup operation. In addition, we propose an adaptive banking scheme to downsize the IQ and reduce the bit-width of tag comparators. Experiments indicate that for an 8-wide issue superscalar or SMT proeessor,the energy consumption of the instruction scheduler can be reduced by 67%. Furthermore, the new design has potentially faster scheduler clock speed while maintaining close IPC to the monolithic scheduler design. Compared with the previous work on eliminating tags through prediction, our design is superior in terms of both energy reduction and SMT support. 展开更多
关键词 content associative memory (CAM) energy-efficient architecture instruction scheduler tag elimination waiting instruction buffer
原文传递
Single-Cycle Bit Permutations with MOMR Execution
4
作者 李佩露 杨骁 史志杰 《Journal of Computer Science & Technology》 SCIE EI CSCD 2005年第5期577-585,共9页
Secure computing paradigms impose new architectural challenges for general-purpose processors. Cryptographic processing is needed for secure communications, storage, and computations. We identify two categories of ope... Secure computing paradigms impose new architectural challenges for general-purpose processors. Cryptographic processing is needed for secure communications, storage, and computations. We identify two categories of operations in symmetric-key and public-key cryptographic algorithms that are not common in previous general-purpose workloads: advanced bit operations within a word and multi-word operations. We define MOMR (Multiple Operands Multiple Results) execution or datarich execution as a unified solution to both challenges. It allows arbitrary n-bit permutations to be achieved in one or two cycles, rather than O(n) cycles as in existing RISC processors. It also enables significant acceleration of multiword multiplications needed by public-key ciphers. We propose two implementations of MOMR: one employs only hardware changes while the other uses Instruction Set Architecture (ISA) support. We show that MOMR execution leverages available resources in typical multi-issue processors with minimal additional cost. Multi-issue processors enhanced with MOMR units provide additional speedup over standard multi-issue processors with the same datapath. MOMR is a general architectural solution for word-oriented processor architectures to incorporate datarich operations. 展开更多
关键词 PERMUTATION bit permutations CRYPTOGRAPHY cryptographic acceleration security multi-word operation datarich execution MOMR instruction set architecture ISA PROCESSOR high performance secure computing
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部