摘要
C870流处理器采用三级存储层次、三级访问模式,其流处理结构特别适合于数据并行性好、全局数据重用较少的计算密集型应用。根据C870流处理器的软硬件结构,针对高度的浮点密集运算、海量数据元素并行计算的问题,本文提出使用计算来隐藏内存访问的延迟,从而提高存储系统的带宽。并首次提出了在C870流处理器上的使用芯片上共享内存(On-chip Shared Memory)的大型矩阵的计算方法,并用5000*5000和2000*2000的方形矩阵进行优化实验,实验结果证明了使用芯片上共享内存优化计算,可以使浮点性能提高7倍多。
C870 stream processor uses three storage levels, three access patterns, the stream structure particularly suited to data parallelism and the overall data reused less compute-intensive applications. The solutions for highly floating point-intensive computing and a large number of data elements parallel computing problems, memory access can use the delay calculation to hide, so as to enhance the bandwidth of system storage. According to the C870 stream processor hardware and software structure, the paper described on the C870 stream processor to use on-chip shared memory to calculate a large matrix, and to use experimental data to prove that the use of the on-chip shared memory of C870 stream processor, can effectively increase the bandwidth of system storage and improve the efficiency of parallel computing.
出处
《微计算机信息》
北大核心
2008年第24期303-305,共3页
Control & Automation