期刊文献+

超大规模序列比对计算的并行优化 被引量:2

Parallelization and optimization of huge scale sequence alignment computation
下载PDF
导出
摘要 针对生物信息学研究中的超大规模序列比对计算问题进行了研究,解决了现有的e-PCR软件包在处理小麦基因引物扩增比对任务中存在的内存瓶颈、I/O瓶颈和计算时间瓶颈问题,利用数据和任务分割的基本方法,使其最关键的引物与模板的比对计算能够大规模并行,进而采用基于主从通信模式的MPI通信框架进行编程实现,并从任务的缩减、负载平衡、容错和多作业并发等方面进行了优化,最终在百万亿次超级计算机上顺利实现了千核级大规模并行计算,在数十日内即可完成原本预期需要数年的小麦序列扩增比对计算。 The computation challange of huge scale sequence alignment computation in bioinformatics was discussed.Bottlenecks of system memory,I/O throughput and computation time were eliminated while using e-PCR software to inspect the primers amplification with gene from wheat.Based on data and task partitioning,the essential mission of aligning the primers through the template sequences could be scalably parallelized.Processing code was designed with MPI under the master-slave communication frame.Further optimization had also been done on the view of computation decreasing,load balancing,fault tolerance and multi-task concurrency.The program had eventually performed 1000 cores scale parallelization on 100 Tflops level supercomputer,so that it is possible to complete the primer amplification computation with wheat gene in dozens of days,despite the original expectation of several years.
出处 《计算机应用》 CSCD 北大核心 2011年第A02期32-35,共4页 journal of Computer Applications
基金 国家863计划项目(2006AA01A116) 中国科学院"十一五"信息化专项(INFO-115-B01) 中国科学院知识创新工程项目(CNIC_QN_10004)
关键词 并行计算 生物信息学 分子标记 序列比对 任务分割 e-PCR parallel computing bioinformatics molecular marker sequence amplification task partitioning e-PCR
  • 相关文献

参考文献11

  • 1李娜,焦浈,秦广雍.DNA分子标记技术及其在小麦育种及遗传研究中的应用[J].核农学报,2005,19(4):322-326. 被引量:26
  • 2YOU F M, WANJUGI H, HUO NAXIN, et al. RJPrimers: Unique transposable dement insertion junction discovery and PCR primer design for marker development [ J]. Nucleic Acids Research, 2010, 38(suppl 2) : 313 -320.
  • 3ROTMISTROVSKY K, JANG W, SCHULER G D. A Web server for performing electronic PCR [ J]. Nucleic Acids Research, 2004, 32( suppl 2) : 108-112.
  • 4深腾7000使用指南[EB/oL].(2010-05-11)[2011-05-23].http://www.sccas.on/gb/compute/supports/documents/Lenov07000.pat.
  • 5BRAAM P J. The lustre storage architecture [ EB/OL]. [2003 -11 -01 ]. http://www, lustre, org/docs/lusla'e, pdf.
  • 6BORRILL J, OLIKER L, SHALF J, eta/. HPG global file system performance analysis using a scientific-application derived benchmark [J]. Parallel Computing, 2009, 35(6): 358-373.
  • 7KIM T, JOO H. ClustalXeed: A GUI-based grid computation version for high performance and terabyte size multiple sequence alignment [J]. BMC Bioinformatics, 2010, 11(1): 467.
  • 8ROSKIN K, PATEN B, HAUSSLER D. Meta-alignment with crumble and prune: Partitioning very large alignment problems for performance and parallelization [ J]. BMC Bioinformatics, 2011, 12 (1): 144-157.
  • 9涂强,郎显宇,陆忠华,迟学斌.InsPecT的2种并行优化方案[J].计算机工程,2010,36(6):100-101. 被引量:1
  • 10LU C. Scalable diskless checkpointing for large parallel systerrrs [ D]. Urbana-Champaign: University of Illinois at Urbana-Champaign, 2009.

二级参考文献24

  • 1胡笳,郭燕婷,李艳梅.蛋白质翻译后修饰研究进展[J].科学通报,2005,50(11):1061-1072. 被引量:31
  • 2Bandeira N, Tang Haixu, Bafna V, et al. Shotgun Protein Sequencing by Tandem Mass Spectra Assembly[J]. Analytical Chemistry, 2004, 76(23): 7221-7233.
  • 3Tanner S, Shu Hongjun, Frank A, et al. InsPecT: Identification of Posttranslationally Modified Peptides from Tandem Mass Spectra[J]. Analytical Chemistry, 2005, 77(14): 4626-4639.
  • 4Tsur D, Tanner S, Bafna V, et al. Identification of Post-translational Modifications by Blind Search of Mass Spectra[J]. Nat Biotechnology, 2005, 23(12): 1562-1567.
  • 5Tommerup I C, Barton J E, Brien P A O. Reliability of RAPD fingerprinting of there basidiomycete fungi, Laccaria, Hydnangium and Rlff.zoctonia.Mycol Res, 1995,99(2) : 179 - 186.
  • 6Zabeau M, Vos P. Selective restriction fragment amplification: a general method for DNA fingerprinting. European patent application number:92402629.7,1993. Publication number EP 0534858.
  • 7Vander Wurff A W G, Chan Y L, Van Straa.ln N M, et al. TE-AFLP: combining rapidity and robustness in DNA fingerprinting. Nucl Acids Res,2000,28(24) :5005 - 5009.
  • 8Olson M, Hood L, Cantor C, et al. A common language for physical mapping of the human genome. Sci, 1989,245, :1434 - 1435.
  • 9Hu X Y,Ohm H W,Dweikat 1. Identification of RAPD markers linked to the gene Pml for resistance to powdery mildew in wheat. Theor Appl Genet, 1997,94:832-840.
  • 10Qi L L, Cao M S, Chen P D, et al. Identification, mapping, and application d polymorphic DNA associated with resistance gene Pro21 of wheat.Genome, 1996,39:191-197.

共引文献25

同被引文献21

  • 1Jiuxing Liu,Jiesheng Wu,Dhabaleswar K. Panda.High Performance RDMA-Based MPI Implementation over InfiniBand[J].International Journal of Parallel Programming.2004(3)
  • 2Tannenbaum T,Litzkow M.The Condor Distributed Processing System[].Dr Dobbs Journal.1995
  • 3G.E. Fagg,J.J. Dongarra.FT-MPI: Fault Tolerant MPI, Supporting Dynamic Applications in a Dynamic World[].Proceedings of EuroPVM-MPI.2000
  • 4G.Stellner.CoCheck: Checkpointing and Process Migration for MPI[].Proceedings of the ~(th) International Parallel Processing Symposium (IPPS ’).1996
  • 5G. E. Fagg,E. Gabriel,G. Bosilca,T. Angskun,Z. Chen,J. Pjesivac-Grbovic,K. London,J. J. Dongarra."Extending the MPI specification for process fault tolerance on high performance computing systems"[].Proceedings of the International Supercomputer Conference.2004
  • 6MVAPICH and MVAPICH2 Project. http://mvapich.cse.ohio-state.edu/ .
  • 7InfiniBand Trade Association. http://www.Infinibandta.org .
  • 8H.Zhong,J.Nieh."CRAK: Linux Checkpoint/Restart as a Kernel Module,"[].Department of Computer Science Columbia University New York Technical Report.2001
  • 9LAM/MPIParallelComputing. http://www.lam-mpi.org[OL] .
  • 10J. S. Plank,M. Beck,G. Kingsley,and K. Li. Libckpt.Transparent Checkpointing under Unix. Technical Report: UT-CS-94-242 . 1994

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部