摘要
本文在深入分析K-means算法计算特征的基础上,基于FPGA平台提出并实现了一种细粒度的并行浮点K-means算法。设计采用了阵列多PE并行处理的任务划分策略,实现了处理单元间的负载平衡,采用数据驱动的流水线隐藏片外存储访问,设计了一种基于脉动阵列结构的主从多PE并行计算阵列,并在单片FPGA(XC5VLX330)上成功集成了4个PE。实验结果表明,我们提出的K-means算法加速器结构具备良好的可扩展性。通过实验测试,我们的实现方案相对于Pentium 4 2.66 GHz单处理器程序达到了15倍的加速比。
We propose a systolic array structure including one master PE and multiple slave PEs for fine grain hardware implementation on FPGA. We partition tasks by rows and assign tasks to PEs for load balance. We exploit data reuse schemes to reduce the need to load data from external memory. To our knowledge, our implementation with 4 PEs is the only FPGA aecelerator(XC5VLX330) implementing the complete K-means clustering algorithm. The experimental results show a factor of more than 15 speedup over the Cluster 3. 0 software running on a PC platform with Pentium 4 2. 66GHz CPU.
出处
《计算机工程与科学》
CSCD
北大核心
2009年第A01期64-67,共4页
Computer Engineering & Science
基金
国家自然科学基金资助项目(2007AA01Z106)