期刊文献+

Data Dependence Graph Directed Scheduling for Clustered VLIW Architectures

Data Dependence Graph Directed Scheduling for Clustered VLIW Architectures
原文传递
导出
摘要 This paper presents an instruction scheduling and cluster assignment approach for clustered very long instruction words (VLIW) processors. The technique produces high performance code by simultaneously balancing instructions among clusters and minimizing the amount of inter-cluster data communications. The scheme is evaluated based on benchmarks extracted from UTDSP. Results show a significant speedup compared with previously used techniques with speed-ups of up to 44%, with average speed-ups ranging from 14% (2-cluster) to 18% (4-cluster). This paper presents an instruction scheduling and cluster assignment approach for clustered very long instruction words (VLIW) processors. The technique produces high performance code by simultaneously balancing instructions among clusters and minimizing the amount of inter-cluster data communications. The scheme is evaluated based on benchmarks extracted from UTDSP. Results show a significant speedup compared with previously used techniques with speed-ups of up to 44%, with average speed-ups ranging from 14% (2-cluster) to 18% (4-cluster).
出处 《Tsinghua Science and Technology》 SCIE EI CAS 2010年第3期299-306,共8页 清华大学学报(自然科学版(英文版)
基金 Supported by the National Natural Science Foundation of China(No. 60236030) the National Research Foundation for the Doctoral Program of Higher Education of China (No. 20050003083) the Tsinghua Basic Research Foundation
关键词 clustered VLIW processor instruction scheduling cluster assignments clustered VLIW processor instruction scheduling cluster assignments
  • 相关文献

参考文献16

  • 1Faraboschi P, Desoli G, Fisher J. Clustered instruction-level parallel processors. Technical Report HPL-98-204. Hew- lett-Paekard Laboratories, 1998.
  • 2Farkas K, Chow P, Jouppi N, et al. The multicluster archi- tecture: Reducing cycle time through partitioning. In: Proceedings of the 30th Annual International Symposium on Microarchitecture. North Carolina, USA, 1997: 149-159.
  • 3Chu M L, Mahlke S A. Compiler-directed data partitioning for multicluster processors. In: Proceedings of the International Symposium on Code Generation and Optimization. Manhattan, New York, USA, 2006:11.
  • 4Codina J M, Sanchez J, Gonzalez A. Virtual cluster scheduling through the scheduling graph. In: Proceedings of the International Symposium on Code Generation and Optimization. San Jose, California, USA, 2007: 89-101.
  • 5Aleta A, Codina J M, Sanchez J, et al. Graph-partitioning based instruction scheduling for clustered processors. In: Proceedings of the 34th International Symposium on ACM/IEEE. Austin, Texas, USA, 2001: 150-159.
  • 6Texas Instruments Inc. TMS320C62x/67x CPU and instruction set reference guide. 1998.
  • 7Fridman J, Greenfield Z. The TigerSharc DSP architecture. IEEE Micro., 2000, 20: 66-76.
  • 8Faraboschi P, Brown G, Fisher J, et al. Lx: A technology platform for customizable VLIW embedded processing. In: Proceedings of the 27th International Symposium on Computer Architecture. Vancouver, BC, Canada, 2000: 203-213.
  • 9Ellis R. Bulldog: A Compiler for VLIW Architectures. MIT Press, 1986: 180-184.
  • 10Chu M, Fan K, Mahlke S. Region-based hierarchical operation partitioning for multicluster processors. In: Proceedings of the SIGPLAN'03 Conferenee on Programming Language Design and Implementation. San Diego, California, USA, 2003:300-311.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部