超大规模序列比对计算的并行优化被引量：2

Parallelization and optimization of huge scale sequence alignment computation

下载PDF

导出

摘要针对生物信息学研究中的超大规模序列比对计算问题进行了研究,解决了现有的e-PCR软件包在处理小麦基因引物扩增比对任务中存在的内存瓶颈、I/O瓶颈和计算时间瓶颈问题,利用数据和任务分割的基本方法,使其最关键的引物与模板的比对计算能够大规模并行,进而采用基于主从通信模式的MPI通信框架进行编程实现,并从任务的缩减、负载平衡、容错和多作业并发等方面进行了优化,最终在百万亿次超级计算机上顺利实现了千核级大规模并行计算,在数十日内即可完成原本预期需要数年的小麦序列扩增比对计算。 The computation challange of huge scale sequence alignment computation in bioinformatics was discussed.Bottlenecks of system memory,I/O throughput and computation time were eliminated while using e-PCR software to inspect the primers amplification with gene from wheat.Based on data and task partitioning,the essential mission of aligning the primers through the template sequences could be scalably parallelized.Processing code was designed with MPI under the master-slave communication frame.Further optimization had also been done on the view of computation decreasing,load balancing,fault tolerance and multi-task concurrency.The program had eventually performed 1000 cores scale parallelization on 100 Tflops level supercomputer,so that it is possible to complete the primer amplification computation with wheat gene in dozens of days,despite the original expectation of several years.

作者曹宗雁郎显宇刘昕迟学斌

机构地区中国科学院计算机网络信息中心中国科学院研究生院中国科学院遗传与发育生物学研究所

出处《计算机应用》 CSCD 北大核心 2011年第A02期32-35,共4页 journal of Computer Applications

基金国家863计划项目(2006AA01A116) 中国科学院"十一五"信息化专项(INFO-115-B01) 中国科学院知识创新工程项目(CNIC_QN_10004)

关键词并行计算生物信息学分子标记序列比对任务分割 e-PCR parallel computing bioinformatics molecular marker sequence amplification task partitioning e-PCR

分类号 TP312 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献11

1李娜,焦浈,秦广雍.DNA分子标记技术及其在小麦育种及遗传研究中的应用[J].核农学报,2005,19(4):322-326. 被引量：26
2YOU F M, WANJUGI H, HUO NAXIN, et al. RJPrimers: Unique transposable dement insertion junction discovery and PCR primer design for marker development [ J]. Nucleic Acids Research, 2010, 38(suppl 2) : 313 -320.
3ROTMISTROVSKY K, JANG W, SCHULER G D. A Web server for performing electronic PCR [ J]. Nucleic Acids Research, 2004, 32( suppl 2) : 108-112.
4深腾7000使用指南[EB/oL].(2010-05-11)[2011-05-23].http://www.sccas.on/gb/compute/supports/documents/Lenov07000.pat.
5BRAAM P J. The lustre storage architecture [ EB/OL]. [2003 -11 -01 ]. http://www, lustre, org/docs/lusla'e, pdf.
6BORRILL J, OLIKER L, SHALF J, eta/. HPG global file system performance analysis using a scientific-application derived benchmark [J]. Parallel Computing, 2009, 35(6): 358-373.
7KIM T, JOO H. ClustalXeed: A GUI-based grid computation version for high performance and terabyte size multiple sequence alignment [J]. BMC Bioinformatics, 2010, 11(1): 467.
8ROSKIN K, PATEN B, HAUSSLER D. Meta-alignment with crumble and prune: Partitioning very large alignment problems for performance and parallelization [ J]. BMC Bioinformatics, 2011, 12 (1): 144-157.
9涂强,郎显宇,陆忠华,迟学斌.InsPecT的2种并行优化方案[J].计算机工程,2010,36(6):100-101. 被引量：1
10LU C. Scalable diskless checkpointing for large parallel systerrrs [ D]. Urbana-Champaign: University of Illinois at Urbana-Champaign, 2009.

二级参考文献24

1胡笳,郭燕婷,李艳梅.蛋白质翻译后修饰研究进展[J].科学通报,2005,50(11):1061-1072. 被引量：31
2Bandeira N, Tang Haixu, Bafna V, et al. Shotgun Protein Sequencing by Tandem Mass Spectra Assembly[J]. Analytical Chemistry, 2004, 76(23): 7221-7233.
3Tanner S, Shu Hongjun, Frank A, et al. InsPecT: Identification of Posttranslationally Modified Peptides from Tandem Mass Spectra[J]. Analytical Chemistry, 2005, 77(14): 4626-4639.
4Tsur D, Tanner S, Bafna V, et al. Identification of Post-translational Modifications by Blind Search of Mass Spectra[J]. Nat Biotechnology, 2005, 23(12): 1562-1567.
5Tommerup I C, Barton J E, Brien P A O. Reliability of RAPD fingerprinting of there basidiomycete fungi, Laccaria, Hydnangium and Rlff.zoctonia.Mycol Res, 1995,99(2) : 179 - 186.
6Zabeau M, Vos P. Selective restriction fragment amplification: a general method for DNA fingerprinting. European patent application number:92402629.7,1993. Publication number EP 0534858.
7Vander Wurff A W G, Chan Y L, Van Straa.ln N M, et al. TE-AFLP: combining rapidity and robustness in DNA fingerprinting. Nucl Acids Res,2000,28(24) :5005 - 5009.
8Olson M, Hood L, Cantor C, et al. A common language for physical mapping of the human genome. Sci, 1989,245, :1434 - 1435.
9Hu X Y,Ohm H W,Dweikat 1. Identification of RAPD markers linked to the gene Pml for resistance to powdery mildew in wheat. Theor Appl Genet, 1997,94:832-840.
10Qi L L, Cao M S, Chen P D, et al. Identification, mapping, and application d polymorphic DNA associated with resistance gene Pro21 of wheat.Genome, 1996,39:191-197.

共引文献25

1车京玉.SSR分子标记在小麦遗传育种中应用与研究进展[J].小麦研究,2013,34(2):1-5.
2张现伟,姬生栋,祝红燕,薛华政,盛有名.DNA分子标记与小麦抗性基因定位研究[J].河南农业科学,2007,36(3):5-9. 被引量：2
3邵景侠,张改生,赵伟,牛娜,马守才.杂交小麦‘西杂一号’种子纯度鉴定的研究[J].西北植物学报,2007,27(6):1108-1111. 被引量：6
4胡重怡,蔡刘体,郑少清.分子标记在烟草育种中的应用与前瞻[J].安徽农业科学,2007,35(25):7871-7872. 被引量：3
5吕伟东,徐鹏彬,蒲训.偃麦草属种质资源在普通小麦育种中的应用现状简介[J].草业学报,2007,16(6):136-140. 被引量：35
6李丽,王海岗,张晓丽,彭锁堂.SSR分子标记在作物遗传育种中的应用[J].山西农业科学,2008,36(3):15-18. 被引量：31
7刘东军,张宏纪,刁艳玲,孙岩,郭强,黄景华,闫文义,杨淑萍,孙光祖.黑龙江省春小麦品种遗传多样性的SSR分析[J].核农学报,2008,22(5):557-562. 被引量：8
8李志红,唐美玲,刘佳,王忆,李天忠,韩振海,许雪峰,孔瑾.珠眉海棠cDNA-AFLP分析体系的建立[J].核农学报,2008,22(5):607-610. 被引量：11
9董媛媛,俞咪娜,李小白,徐攀峰,崔海瑞,张明龙.EST-SSR和RAPD标记检测油菜(Brassica napus)遗传多样性[J].核农学报,2008,22(5):611-616. 被引量：9
10朱德威,陈庆富.普通小麦遗传图谱研究现状与展望[J].种子,2010,29(3):64-69. 被引量：9

同被引文献21

1Jiuxing Liu,Jiesheng Wu,Dhabaleswar K. Panda.High Performance RDMA-Based MPI Implementation over InfiniBand[J].International Journal of Parallel Programming.2004(3)
2Tannenbaum T,Litzkow M.The Condor Distributed Processing System[].Dr Dobbs Journal.1995
3G.E. Fagg,J.J. Dongarra.FT-MPI: Fault Tolerant MPI, Supporting Dynamic Applications in a Dynamic World[].Proceedings of EuroPVM-MPI.2000
4G.Stellner.CoCheck: Checkpointing and Process Migration for MPI[].Proceedings of the ~(th) International Parallel Processing Symposium (IPPS ’).1996
5G. E. Fagg,E. Gabriel,G. Bosilca,T. Angskun,Z. Chen,J. Pjesivac-Grbovic,K. London,J. J. Dongarra."Extending the MPI specification for process fault tolerance on high performance computing systems"[].Proceedings of the International Supercomputer Conference.2004
6MVAPICH and MVAPICH2 Project. http://mvapich.cse.ohio-state.edu/ .
7InfiniBand Trade Association. http://www.Infinibandta.org .
8H.Zhong,J.Nieh."CRAK: Linux Checkpoint/Restart as a Kernel Module,"[].Department of Computer Science Columbia University New York Technical Report.2001
9LAM/MPIParallelComputing. http://www.lam-mpi.org[OL] .
10J. S. Plank,M. Beck,G. Kingsley,and K. Li. Libckpt.Transparent Checkpointing under Unix. Technical Report: UT-CS-94-242 . 1994

引证文献2

1赵毅,曹宗雁,朱鹏,迟学斌.不同层次MPI并行程序容错的比较[J].科研信息化技术与应用,2011,2(6):14-21. 被引量：2
2董改芳,付学良,李宏慧.多序列星比对算法的改进及其在Spark中的并行化研究[J].计算机科学,2017,44(10):55-58.

二级引证文献2

1魏迪,尹万旺,刘勇.MPI现状及其关键技术研究[J].高性能计算技术,2014,0(2):32-38.
2张路青.基于MPI的数据融合并行化容错技术研究[J].舰船电子工程,2018,38(8):46-49. 被引量：2

1黄胜华,沈辉.突破640k编程[J].新浪潮,1994(9):14-19.
2谢世诚.H3C发布系列万兆存储产品另一场革命:万兆存储端到端[J].微型机与应用,2007,26(11):85-85.
3迷失的布线——被错看的“验收测试”[J].网管员世界,2007(14):42-42.
4常丽华.相互编织的九月十日[J].山东教育（小学刊）,2008(9):11-13.
5王亚雪.维客中国十日祭[J].互联网周刊,2005(43):58-59. 被引量：3
6蒋伟,袁可,刘婧.数据挖掘在杂种优势预测中的研究[J].西南大学学报（自然科学版）,2007,29(9):139-142.
7马哲.全面理解虚拟内存及优化方法[J].中国电子与网络出版,2003,0(9):57-57.
8寻找Pentium4的内存快车[J].电子与电脑,2002(4):69-71.
9王红霞,王坤.基于加锁机制的静态手势识别方法[J].计算机应用,2016,36(7):1959-1964. 被引量：4
10黎文伟,张大方,谢高岗,杨金民.一种基于序列比对的路由对称性定量分析方法[J].系统仿真学报,2006,18(10):2798-2801. 被引量：1

计算机应用

2011年第A02期

浏览历史

内容加载中请稍等...

超大规模序列比对计算的并行优化被引量：2

参考文献11

二级参考文献24

共引文献25

同被引文献21

引证文献2

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

超大规模序列比对计算的并行优化 被引量：2

参考文献11

二级参考文献24

共引文献25

同被引文献21

引证文献2

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

超大规模序列比对计算的并行优化被引量：2