摘要
自2008年1月高通量测序技术应用以来,测序的通量和成本都在不断下降.然而基因数据的爆发式增长速度已经超过了摩尔定律,对海量数据的计算处理能力成为制约基因测序应用推广的瓶颈.以基于Hash索引的重测序算法为目标,对计算和访存行为进行分析,从而提出了一个现场可编程门阵列(field programmable gate array,FPGA)作为协处理器的架构,并在Convey公司的HC-1ex平台上进行了设计与实现.其基本处理单元内部采用全流水的设计及FIFO隔离计算模块和访存模块,可以完整执行重测序算法的核心流程.通过将基本处理单元和访存端口的一对一绑定,在4块Xilinx Virtex-6LX760上实现了64路并行处理流程,总平均读内存带宽可达22.59GBps.与8核Intel Xeon处理器相比,可以提升28.5倍的性能.
Since January 2008 when the next-generation DNA sequencing platforms were developed,the sequencing throughput has been significantly improved.However,this technology has been challenged by the large amount of sequencing data which grows dramatically even over the Moore's Law.As an emerging data-intensive workload,the high-throughput re-sequencing tools like Hashbased programs shows different characteristics from traditional computational applications.Both low arithmetic intensity and irregular memory access pattern are major sources of inefficiency on commodity multi-core platforms.In this paper,we propose co-processor architecture for accelerating a short reads mapping algorithm.The complete mapping flow in one processing element(PE)is integrated to an exclusive memory port to improve the parallel performance.This proposed architecture is then implemented on a Convey HC-1ex reconfigurable computer.The design includes64 parallel PEs on 4Xilinx Virtex-6LX760 that operate at 150 MHz.Compared with an Intel Xeon8-cores CPU,the speedup achieves 28.5times,and the average memory read bandwidth achieves22.59 GBps.Therefore,this proposed design can potentially supply a solution to the large-amount data challenge and be applied in high throughput re-sequencing.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2014年第9期1980-1992,共13页
Journal of Computer Research and Development
基金
国家"九七三"重点基础研究发展计划基金项目(2012CB316502)
关键词
高通量测序技术
短序列比对
Hash索引
现场可编程门阵列
异构体系结构
high-throughput sequencing
short reads mapping
Hash-index
field programmable gate array(FPGA)
heterogeneous architecture