摘要
目的构建本地化的高性能一站式数据分析平台,为生物医学研究的相关科研人员提供便捷高效的计算分析服务。方法将Galaxy软件部署在计算集群上,集成工具软件和数据集;利用分布式资源管理应用接口(DRMAA)实现与Sun Grid Engine的协同运作,自动调度和分配计算资源;并在集群上构建稳定的Web服务、FTP服务和管理数据库。结果该平台已投入试运行并在不断完善,峰值计算能力达到每秒10万亿次,存储容量为40TB,提供序列比对、短串映射、基因注释、转录组分析、宏基因组分析及进化分析等多种功能,以及容量约为700 GB的人类基因组、病毒、细菌、真菌等参考数据库。结论该平台具备大规模数据分析的能力,能够解决高通量测序所带来的海量生物数据的存储与处理等问题。与在普通服务器上进行数据分析相比,该平台的计算集群能极大地加快数据处理过程,提高研究效率。
Objective To construct a localizehighperformance and onestop data analysis platform in order to pro vide quick and efficient computational analysis services for biomedical studies. Methods Galaxy was deployed on the computing cluster with software tools and datasets. It could schedule and assign computing resources automatically by collabora ting with Sun Grid Engine through distributed resource management application API (DRMAA) interface. Stable Web services, FTP services and management database were also constructed on the computing cluster. Results The platform was put into trial operation and its performace was improved constantly. The peak computing power was 10 trillion times per second and the storage capacity was 40 TB. It could provide various analysis functions, such as sequence alignment, short sequence mapping, gene annotation, transcriptome analysis, metagenomic analysis, and phylogenetic analysis. It was also integrated with approximately 700 GB reference databases including human genome, viruses, bacteria, fungi. Conclusion With this capacity of huge data analysis, the platform could solve problems with massive biological data storage and analysis brought by highthroughput sequencing. Compared with the common server, the computing cluster of the platform can greatly accelerate the data analysis process and promote research efficiency.
出处
《军事医学》
CAS
CSCD
北大核心
2013年第10期780-783,共4页
Military Medical Sciences
基金
军队后勤科技"十二五"重点资助项目(BWS11J070
BS212J009)