摘要
生物医学是一门新兴的前沿交叉学科,它综合了医学、生命科学和生物学的理论和方法而发展起来.近年来随着先进仪器装备与信息技术等越来越广泛和深入的整合到生物技术中来,生物医学研究中越来越频繁的涉及到大数据存储和分析等信息技术.大数据时代的来临对生物医学研究产生了重大影响.其中,一个重要发展趋势就是由假设驱动向数据驱动的转变.数十年来分子生物学水平上的实验目的是获得结论或者是提出一种新的假设,而现在基于海量生物医学大数据,可以对海量数据的研究来探索其中的规律,直接提出假设或得出可靠的结论.随着先进的生物分析技术的不断推出和更新,生物医学数据迅速积累.基于此类大数据一些以往不能解决的问题将有望解决,同时相关生物医学研究的新问题也层出不穷.生物医学相关的大数据技术和相关应用主要包括:基于高通量测序的个性化基因组、转录组和蛋白组研究,单细胞水平基因型和表型研究,人类健康相关微生物群落研究,生物医学图像研究等.相关生物医学大数据分析任务均具有着数据密集和计算密集的双密集性特点.要充分地利用这些大数据解决一系列生物医学问题,迫切需要高通量、高效率、高准确性的生物信息存储和分析策略.本文总结和回顾生物医学大数据的生成、管理和分析相关的一系列问题,其中重点讨论人体微生物群落、单细胞表型和基因型、生物医学图像等新近出现的生物医学大数据形式,以及相关数据分析和应用前景等.基于目前生物医学大数据的现状我们可以发现,生物医学大数据的研究正处于蓄势待发状态:适应于生物医学大数据的软硬件平台、大数据存储、大数据分析挖掘等方法等还不成熟,制约着生物大数据的研究.然而一旦相关研究获得突破并有所优化和应用,将会全方位地支撑生物医学大数据的深入解构;进而有助于对医学现象的趋势分析和预测,服务于相关的遗传疾病研究、公共卫生监控、医疗与医药开发等广泛生物医学应用.
At the frontier of cross-disciplinary sciences, biomedical research combines theory with methods, and biomedical sciences with computation. The recent in-depth integration of advanced equipment and information technology in biotechnology has led to an explosion of data collection, and thus there is a great need for data storage and analysis. Furthermore, the big data era is impacting greatly on biomedical research. In particular, research is transforming from hypothesis-driven to data-driven investigations. For decades, molecular biology research has been hypothesis driven, but the availability of massive biomedical data now allows researchers to directly explore the regularity contained in the data, make assumptions, and draw conclusions. With the fast accumulation of biomedical data, many problems that were unsolvable in the past can now be solved by carefully designed data analysis methods. At the same time, many new problems in biomedical research have emerged. Examples of big data technologies and applications include personalized genomics, transcriptomic and proteomic studies, genotyping and phenotyping of single cells, microbial community research, and biomedical imaging. All these applications are both data intensive and computation intensive, and thus advanced storage and analysis strategies characterized as being high throughput, high efficiency and high accuracy, are urgently needed to process these massive biological data. In this article, we summarize and review several aspects of biomedical big data (data generation, management, and analysis) and focus on data analysis and the application prospects of newly emerging data including human microbiota, the phenotype and genotype of single cells, and biomedical imaging. We conclude that biomedical big data is gaining momentum, although current hardware and software platforms for data-driven analysis remain a significant hurdle. We expect that as big data analysis breaks through this bottleneck, the in-depth research of biomedical big data will make a more significant contribution to clinical diagnosis and treatment.
出处
《科学通报》
EI
CAS
CSCD
北大核心
2015年第5期534-546,共13页
Chinese Science Bulletin
基金
国家自然科学基金(30870572,61303161,61103167)
国家高技术研究发展计划(2014AA021502)资助
关键词
生物医学
大数据
微生物群落
单细胞
医学图像
数据挖掘
biomedical research, big data, microbial community, single-cell, bio-imaging, data mining