期刊文献+

利用Galaxy与高性能计算集群构建本地化一站式生物信息学平台 被引量:1

Build localized and one-stop bioinformatics platform using the Galaxy and high performance computing clusters
原文传递
导出
摘要 目的构建本地化的高性能一站式数据分析平台,为生物医学研究的相关科研人员提供便捷高效的计算分析服务。方法将Galaxy软件部署在计算集群上,集成工具软件和数据集;利用分布式资源管理应用接口(DRMAA)实现与Sun Grid Engine的协同运作,自动调度和分配计算资源;并在集群上构建稳定的Web服务、FTP服务和管理数据库。结果该平台已投入试运行并在不断完善,峰值计算能力达到每秒10万亿次,存储容量为40TB,提供序列比对、短串映射、基因注释、转录组分析、宏基因组分析及进化分析等多种功能,以及容量约为700 GB的人类基因组、病毒、细菌、真菌等参考数据库。结论该平台具备大规模数据分析的能力,能够解决高通量测序所带来的海量生物数据的存储与处理等问题。与在普通服务器上进行数据分析相比,该平台的计算集群能极大地加快数据处理过程,提高研究效率。 Objective To construct a localizehighperformance and onestop data analysis platform in order to pro vide quick and efficient computational analysis services for biomedical studies. Methods Galaxy was deployed on the computing cluster with software tools and datasets. It could schedule and assign computing resources automatically by collabora ting with Sun Grid Engine through distributed resource management application API (DRMAA) interface. Stable Web services, FTP services and management database were also constructed on the computing cluster. Results The platform was put into trial operation and its performace was improved constantly. The peak computing power was 10 trillion times per second and the storage capacity was 40 TB. It could provide various analysis functions, such as sequence alignment, short sequence mapping, gene annotation, transcriptome analysis, metagenomic analysis, and phylogenetic analysis. It was also integrated with approximately 700 GB reference databases including human genome, viruses, bacteria, fungi. Conclusion With this capacity of huge data analysis, the platform could solve problems with massive biological data storage and analysis brought by highthroughput sequencing. Compared with the common server, the computing cluster of the platform can greatly accelerate the data analysis process and promote research efficiency.
出处 《军事医学》 CAS CSCD 北大核心 2013年第10期780-783,共4页 Military Medical Sciences
基金 军队后勤科技"十二五"重点资助项目(BWS11J070 BS212J009)
关键词 本地化 一站式 生物信息学 GALAXY 在线分析 高性能计算 localized one-stop bioinformatics Galaxy online analysis high-performance computing
  • 相关文献

参考文献14

  • 1Kunin V, Copeland A, Lapidus A, et al. A bioinformatician's guide to metagenomics [ J ]. Microbiol Mol Biol Rev, 2008, 72 (4) :557 -578.
  • 2Blankenberg D, Von Kuster G, Coraor N,et al. Galaxy: a Web- based genome analysis tool for experimentalists[ J]. Curr Protoc Mol Biol, 2010, Chapter 19 : Unit 19.10.1 - 10.10.21.
  • 3Altschul SF, Gish W, Miller W, et al. Basic local alignment search tool[J]. J Mol Biol, 1990, 215(3) :403 -410.
  • 4Kent WJ. BLAT-the BLAST-like alignment tool [ J]. Genome Res, 2002, 12(4) :656 -664.
  • 5Langmead B, Trapnell C, Pop M, et al. Salzberg. Uhrafast and memory-efficient alignment of short DNA sequences to the human genome[J]. Genome Biol, 2009, 10(3) :R25.
  • 6Trapnell C, Williams BA, Pertea G, et al. Transcript assembly and quantification by RNA-seq reveals unannotated transcripts and isoform switching during cell differentiation[ J]. Nat Biotech- nol, 2010,28(5) : 511 -515.
  • 7Zhang Y, Liu T, Meyer CA, et al. Model-based analysis of CHIP-Seq (MACS)[J]. Genome Biol, 2008, 9(9):R137.
  • 8Huelsenbeck JP, Ronquist F. MRBAYES: Bayesian inference of phylogenetic trees[ J]. Bioinformatics, 2001, 17 ( 8 ) : 754 - 755.
  • 9Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq [ J ]. Bioinformatics, 2009, 25 ( 9 ) : 1105 -1111.
  • 10毛逸清,李江域,王小磊,赵东升.构建NCBI镜像FTP数据库及其应用[J].军事医学科学院院刊,2010,34(3):272-274. 被引量:1

二级参考文献9

  • 1Creating NCBI[EB/OL].http://www.ncbi.nlm.nih.gov/About/glance/ourmission.html.[2004-05-21].
  • 2Evans C.vsftpd Online docs[EB/OL].http://vsftpd.beasts.org/.[2009-05-28].
  • 3Stein LD,著,王超,刘云,译.Perl网络编程[M].北京:清华大学出版社,2002:107-108.
  • 4Barr G.Net∷FTP[EB/OL].http://perldoc.perl.org/Net/FTP.html.[2005-01-01].
  • 5Red Hat Inc.红帽企业Linux5服务器功能[EB/OL].http://www.cn.redhat.com/rhel/features/.[2009-12-15].
  • 6Genome Project Help[EB/OL].http://www.ncbi.nlm.nih.gov/genomes/static/gprj_help.html#introduction.[2005-05-06].
  • 7Pontius JU,Wagner L,Schuler GD.UniGene:a unified view of the transcriptome[EB/OL].http://www.ncbi.nlm.nih.gov/unigene.[2002-10-09].
  • 8Benson DA,Karsch-Mizrachi I,Lipman DJ,et al.GenBank[J].Nucleic Acids Res Database issue,2009,37(1):26-31.
  • 9Pruitt KD,Tatusova T,Maglott DR.NCBI Reference Sequence (RefSeq):a curated non-redundant sequence database of genomes,transcripts and proteins[J].Nucleic Acids Res Database issue,2007,35(1):61-65.

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部