期刊文献+
共找到8篇文章
< 1 >
每页显示 20 50 100
A TSE based design for MMSE and QRD of MIMO systems based on ASIP
1
作者 冯雪林 SHI Jinglin +3 位作者 CHEN Yang FU Yanlu ZHANG Qineng XIAO Feng 《High Technology Letters》 EI CAS 2023年第2期166-173,共8页
A Taylor series expansion(TSE) based design for minimum mean-square error(MMSE) and QR decomposition(QRD) of multi-input and multi-output(MIMO) systems is proposed based on application specific instruction set process... A Taylor series expansion(TSE) based design for minimum mean-square error(MMSE) and QR decomposition(QRD) of multi-input and multi-output(MIMO) systems is proposed based on application specific instruction set processor(ASIP), which uses TSE algorithm instead of resource-consuming reciprocal and reciprocal square root(RSR) operations.The aim is to give a high performance implementation for MMSE and QRD in one programmable platform simultaneously.Furthermore, instruction set architecture(ISA) and the allocation of data paths in single instruction multiple data-very long instruction word(SIMD-VLIW) architecture are provided, offering more data parallelism and instruction parallelism for different dimension matrices and operation types.Meanwhile, multiple level numerical precision can be achieved with flexible table size and expansion order in TSE ISA.The ASIP has been implemented to a 28 nm CMOS process and frequency reaches 800 MHz.Experimental results show that the proposed design provides perfect numerical precision within the fixed bit-width of the ASIP, higher matrix processing rate better than the requirements of 5G system and more rate-area efficiency comparable with ASIC implementations. 展开更多
关键词 multi-input and multi-output(MIMO) minimum mean-square error(MMSE) QR decomposition(QRD) Taylor series expansion(TSE) application specific instruction set processor(ASIP) instruction set architecture(ISA) single instruction multiple data(SIMD) very long instruction word(VLIW)
下载PDF
A scalable and low power VLIW DSP core for embedded system design 被引量:1
2
作者 Sheraz Anjum 陈杰 +4 位作者 韩亮 林川 张晓潇 苏叶华 程亚奇 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2008年第2期172-175,共4页
Aims to provide the block architecture of CoStar3400 DSP that is a high performance, low power and scalable VLIW DSP core, it efficiently deployed a variable-length execution set (VLES) execution model which utilizes ... Aims to provide the block architecture of CoStar3400 DSP that is a high performance, low power and scalable VLIW DSP core, it efficiently deployed a variable-length execution set (VLES) execution model which utilizes the maximum parallelism by allowing multiple address generations and data arithmetic logic units to execute multiple instructions in a single clock cycle. The scalability was provided mainly in using more or less number of functional units according to the intended application. Low power support was added by careful architectural design techniques such as fine-grain clock gating and activation of only the required number of control signals at each stage of the pipeline. The said features of the core make it a suitable candidate for many SoC configurations, especially for compute intensive applications such as wire-line and wireless communications, including infrastructure and subscriber communications. The embedded system designers can efficiently use the scalability and VLIW features of the core by scaling the number of execution units according to specific needs of the application to effectively reduce the power consumption, chip area and time to market the intended final product. 展开更多
关键词 very long instruction word (VLIW) low Dower DSP compute intensive system on chip (SoC)
下载PDF
Efficient matrix inversion based on VLIW architecture
3
作者 Li Zhang Fu Li Guangming Shi 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2014年第3期393-398,共6页
Matrix inversion is a critical part in communication, signal processing and electromagnetic system. A flexible and scalable very long instruction word (VLIW) processor with clustered architecture is proposed for mat... Matrix inversion is a critical part in communication, signal processing and electromagnetic system. A flexible and scalable very long instruction word (VLIW) processor with clustered architecture is proposed for matrix inversion. A global register file (RF) is used to connect al the clusters. Two nearby clusters share a local register file. The instruction sets are also designed for the VLIW processor. Experimental results show that the proposed VLIW architecture takes only 45 latency to invert a 4 × 4 matrix when running at 150 MHz. The proposed design is roughly five times faster than the DSP solution in processing speed. 展开更多
关键词 matrix inversion very long instruction word (VLIW) latency register file (RF) cluster.
下载PDF
Architecture Design of a Variable Length Instruction Set VLIW DSP 被引量:11
4
作者 沈钲 何虎 +2 位作者 杨旭 贾迪 孙义和 《Tsinghua Science and Technology》 SCIE EI CAS 2009年第5期561-569,共9页
The cost of the central register file and the size of the program code limit the scalability of very long instruction word(VLIW) processors with increasing numbers of functional units.This paper presents the archite... The cost of the central register file and the size of the program code limit the scalability of very long instruction word(VLIW) processors with increasing numbers of functional units.This paper presents the architectural design of a six-way VLIW digital signal processor(DSP) with clustered register files.The architecture uses a variable length instruction set and supports dynamic instruction dispatching.The one-level memory system architecture of the processor includes 16-KB instruction and data caches and 16-KB instruction and data on-chip RAM.A compiler based on the Open64 was developed for the system.Evaluations show that the processor is suitable for high performance applications with a high code density and small program code size. 展开更多
关键词 digital signal processor(DSP) very long instruction word(VLIW) variable length instruction set clustered register file
原文传递
Feedback Cache Mechanism for Dynamically Reconfigurable VLIW Processors 被引量:2
5
作者 Sensen Hu Weixing Ji Yizhuo Wang 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2017年第3期303-316,共14页
Very Long Instruction Word (VLIW) architectures are commonly used in application-specific domains due to their parallelism and low-power characteristics. Recently, parameterization of such architectures allows for r... Very Long Instruction Word (VLIW) architectures are commonly used in application-specific domains due to their parallelism and low-power characteristics. Recently, parameterization of such architectures allows for runtime adaptation of the issue-width to match the inherent Instruction Level Parallelism (ILP) of an application. One implementation of such an approach is that the event of the issue-width switching dynamically triggers the reconfiguration of the data cache at runtime. In this paper, the relationship between cache resizing and issue-width is well investigated. We have observed that the requirement of the cache does not always correlate with the issue- width of the VLIW processor. To further coordinate the cache resizing with the changing issue-width, we present a novel feedback mechanism to "block" the low yields of cache resizing when the issue-width changes. In this manner, our feedback cache mechanism has a coordinated effort with the issue-width changes, which leads to a noticeable improvement of the cache performance. The experiments show that there is 10% energy savings as well as a 2.3% cache misses decline on average achieved, compared with the cache without the feedback mechanism. Therefore, the feedback mechanism is proven to have the capability to ensure more benefits are achieved from the dynamic and frequent reconfiguration. 展开更多
关键词 RECONFIGURATION very long instruction word (VLIW) issue-width CACHE FEEDBACK
原文传递
Research and Design of Reconfigurable Composite Field Multiplication in Symmetric Cipher Algorithms 被引量:1
6
作者 SU Yang ZHANG Mingshu YANG Kai 《Wuhan University Journal of Natural Sciences》 CAS CSCD 2016年第3期235-241,共7页
The composite field multiplication is an important and complex module in symmetric cipher algorithms, and its realization performance directly restricts the processing speed of symmetric cipher algorithms. Based on th... The composite field multiplication is an important and complex module in symmetric cipher algorithms, and its realization performance directly restricts the processing speed of symmetric cipher algorithms. Based on the characteristics of composite field multiplication in symmetric cipher algorithms and the realization principle of its reconfigurable architectures, this paper describes the reconfigurable composite field multiplication over GF((2^8)k) (k=1,2,3,4) in RISC (reduced instruction set computer) processor and VLIW (very long instruction word) processor architecture, respectively. Through configuration, the architectures can realize the composite field multiplication over GF(2^8), GF ((2^8)2), GF((28)3) and GF((28)4) flexibly and efficiently. We simulated the function of circuits and synthesized the reconfigurable design based on the 0.18 μm CMOS (complementary metal oxide semiconductor) standard cell library and the comparison with other same kind designs. The result shows that the reconfigurable design proposed in the paper can provide higher efficiency under the premise of flexibility. 展开更多
关键词 RECONFIGURABLE composite field multiplication symmetric cipher algorithm RISC VLIW very long instruction word
原文传递
Trace Software Pipelining
7
作者 王剑 AndreasKrall 《Journal of Computer Science & Technology》 SCIE EI CSCD 1995年第6期481-490,共10页
Global software pipelining is a complex but efficient compilation technique to exploit instruction-level parallelism for loops with branches. This paper presents a novel global software pipelining technique, called Th... Global software pipelining is a complex but efficient compilation technique to exploit instruction-level parallelism for loops with branches. This paper presents a novel global software pipelining technique, called Thace Software Pipelining,targeted to the instruction-level parallel processors such as Very Long Instruc-tion Word (VLIW) and superscalar machines. Thace software pipelining applies a global code scheduling technique to compact the original loop body. The re-sulting loop is called a trace software pipelined (TSP) code. The trace softwrae pipelined code can be directly executed with special architectural support or call be transformed into a globally software pipelined loop for the current VLIW and superscalar processors. Thus, exploiting parallelism across all iterations of a loop can be completed through compacting the original loop body with any global code scheduling technique. This makes our new technique very promis-ing in practical compilers. Finally, we also present the preliminary experimental results to support our new approach. 展开更多
关键词 instruction-level parallelism fine-grain parallelism software pipelining loop scheduling very long instruction word (VLIW) superscalar processor
原文传递
Leakage-Aware Modulo Scheduling for Embedded VLIW Processors
8
作者 关永 薛京灵 《Journal of Computer Science & Technology》 SCIE EI CSCD 2011年第3期405-417,共13页
As semi-conductor technologies move down to the nanometer scale, leakage power has become a significant component of the total power consumption. In this paper, we present a leakage-aware modulo scheduling algorithm t... As semi-conductor technologies move down to the nanometer scale, leakage power has become a significant component of the total power consumption. In this paper, we present a leakage-aware modulo scheduling algorithm to achieve leakage energy saving for applications with loops on Very Long Instruction Word (VLIW) architectures. The proposed algorithm is designed to maximize the idleness of function units integrated with the dual-threshold domino logic, and reduce the number of transitions between the active and sleep modes. We have implemented our technique in the Trimaran compiler and conducted experiments using a set of embedded benchmarks from DSPstone and Mibench on the cycle-accurate VLIW simulator of Trimaran. The results show that our technique achieves significant leakage energy saving compared with a previously published DAG-based (Directed Acyclic Graph) leakage-aware scheduling algorithm. 展开更多
关键词 leakage power very long instruction word (VLIW) software pipelining modulo scheduling
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部