摘要
随着生命科学和医疗信息化的快速发展,生物医学数据出现了爆炸式增长趋势,其处理面临数据量大、维度关系复杂和交互式响应要求高等问题。传统的数据库以及Hadoop框架在处理生物医学大数据方面都存在一些不足。Spark是一个新兴的基于内存计算的开源大数据平台,具有丰富的编程接口、通用的处理框架和多元化的运行模式。本文介绍了Spark的关键技术和特性,以及不同来源生物医学大数据特点和成功案例,表明Spark在生物医学大数据处理中的适用性和潜在优势。
With the rapid development of life sciences and medical informatization, an explosive growthtrend of biomedical data has appeared, whose processing has the problems of a large mount of data, complexmulti-dimensional relations and high interactive response demands. There are some defects in biomedical bigdata processing by using traditional database and Hadoop. Spark is a novel open-source big data platform basedon memory computation, which has abundant programming interfaces, general processing framework andpluralistic operation modes. This article introduced the key technologies and features of Spark, combinedanalysis of characteristics of biomedical big data and successful cases of Spark, and discussed the applicabilityand potential advantages of Spark in the biomedical big data processing.
出处
《中国中医药图书情报杂志》
2015年第2期1-5,共5页
Chinese Journal of Library and Information Science for Traditional Chinese Medicine
基金
军队后勤科技"十二五"重点项目(BS211R008
BS212J009)