指令压缩技术能够克服传统超长指令字(very long instruction word,VLIW)结构的指令高速缓冲(cache)中长指令字密度低的缺陷,使长指令字中的各条指令能紧密地排列在高速缓冲行(cache line)中,但可能导致长指令字分置于两个cache line,...指令压缩技术能够克服传统超长指令字(very long instruction word,VLIW)结构的指令高速缓冲(cache)中长指令字密度低的缺陷,使长指令字中的各条指令能紧密地排列在高速缓冲行(cache line)中,但可能导致长指令字分置于两个cache line,使其不能同时参与取指与发射,从而成为处理器的性能瓶颈.受到分置cache line的影响,传统提升循环效率的软件流水方法性能下降.高性能变长指令发射窗的机制能够解决分离指令字带来的取指发射问题,为取指流水线提供高效连续的指令流,特别地,该机制缓存循环的一次迭代,硬件支持循环的软件流水,有效地增强VLIW结构的数字信号处理器(digital signal processor,DSP)的性能.通过搭建时钟精确的处理器仿真模型,并基于DSP?IMG库上进行仿真,结果显示,采用两级指令发射窗机制,平均性能提高约21.89%.展开更多
Aims to provide the block architecture of CoStar3400 DSP that is a high performance, low power and scalable VLIW DSP core, it efficiently deployed a variable-length execution set (VLES) execution model which utilizes ...Aims to provide the block architecture of CoStar3400 DSP that is a high performance, low power and scalable VLIW DSP core, it efficiently deployed a variable-length execution set (VLES) execution model which utilizes the maximum parallelism by allowing multiple address generations and data arithmetic logic units to execute multiple instructions in a single clock cycle. The scalability was provided mainly in using more or less number of functional units according to the intended application. Low power support was added by careful architectural design techniques such as fine-grain clock gating and activation of only the required number of control signals at each stage of the pipeline. The said features of the core make it a suitable candidate for many SoC configurations, especially for compute intensive applications such as wire-line and wireless communications, including infrastructure and subscriber communications. The embedded system designers can efficiently use the scalability and VLIW features of the core by scaling the number of execution units according to specific needs of the application to effectively reduce the power consumption, chip area and time to market the intended final product.展开更多
Matrix inversion is a critical part in communication, signal processing and electromagnetic system. A flexible and scalable very long instruction word (VLIW) processor with clustered architecture is proposed for mat...Matrix inversion is a critical part in communication, signal processing and electromagnetic system. A flexible and scalable very long instruction word (VLIW) processor with clustered architecture is proposed for matrix inversion. A global register file (RF) is used to connect al the clusters. Two nearby clusters share a local register file. The instruction sets are also designed for the VLIW processor. Experimental results show that the proposed VLIW architecture takes only 45 latency to invert a 4 × 4 matrix when running at 150 MHz. The proposed design is roughly five times faster than the DSP solution in processing speed.展开更多
文摘指令压缩技术能够克服传统超长指令字(very long instruction word,VLIW)结构的指令高速缓冲(cache)中长指令字密度低的缺陷,使长指令字中的各条指令能紧密地排列在高速缓冲行(cache line)中,但可能导致长指令字分置于两个cache line,使其不能同时参与取指与发射,从而成为处理器的性能瓶颈.受到分置cache line的影响,传统提升循环效率的软件流水方法性能下降.高性能变长指令发射窗的机制能够解决分离指令字带来的取指发射问题,为取指流水线提供高效连续的指令流,特别地,该机制缓存循环的一次迭代,硬件支持循环的软件流水,有效地增强VLIW结构的数字信号处理器(digital signal processor,DSP)的性能.通过搭建时钟精确的处理器仿真模型,并基于DSP?IMG库上进行仿真,结果显示,采用两级指令发射窗机制,平均性能提高约21.89%.
基金the National Natural Science Foundation of China(Grant No.60425413)COMSATS Institute of Information Technology, Pakistan
文摘Aims to provide the block architecture of CoStar3400 DSP that is a high performance, low power and scalable VLIW DSP core, it efficiently deployed a variable-length execution set (VLES) execution model which utilizes the maximum parallelism by allowing multiple address generations and data arithmetic logic units to execute multiple instructions in a single clock cycle. The scalability was provided mainly in using more or less number of functional units according to the intended application. Low power support was added by careful architectural design techniques such as fine-grain clock gating and activation of only the required number of control signals at each stage of the pipeline. The said features of the core make it a suitable candidate for many SoC configurations, especially for compute intensive applications such as wire-line and wireless communications, including infrastructure and subscriber communications. The embedded system designers can efficiently use the scalability and VLIW features of the core by scaling the number of execution units according to specific needs of the application to effectively reduce the power consumption, chip area and time to market the intended final product.
基金supported by the National Natural Science Foundation of China(6110015561227004+4 种基金613720716137213161201289)the Fundamental Research Funds of the Central Universities of China(K5051302096JB140207)
文摘Matrix inversion is a critical part in communication, signal processing and electromagnetic system. A flexible and scalable very long instruction word (VLIW) processor with clustered architecture is proposed for matrix inversion. A global register file (RF) is used to connect al the clusters. Two nearby clusters share a local register file. The instruction sets are also designed for the VLIW processor. Experimental results show that the proposed VLIW architecture takes only 45 latency to invert a 4 × 4 matrix when running at 150 MHz. The proposed design is roughly five times faster than the DSP solution in processing speed.