摘要
【目的】全面阐述基因组学数据分析方法的现状和未来发展趋势,为精准医学、精准育种、生物安全、生物多样性、分子进化等的相关组学数据分析算法的研究与工具开发提供参考。【结果】基因组学数据分析主要包括基因组、转录组、表观组数据分析,当前基因组学数据主要面临着海量、多维、异构等挑战。本文详细地阐述了基因组学数据分析算法和工具开发的现状、应用、存在的问题和面临的挑战。【结论】充分利用人工智能、统计模型、知识图谱等先进技术,不断地优化和开发更先进的算法和更鲁棒的模型,使其兼具高容错、高准确、高效、计算资源低耗等优点,匹配海量、多维、异构基因组学大数据分析的需求,是未来基因组学数据分析算法和工具开发的方向。
[Objective]Through a comprehensive review of the current status and future development of genomics data analysis methods,we provide suggestions for the improvement of algorithm and tool development of related omics data analysis in precision medicine,precision breeding,biosafety,biodiversity and molecular evolution.[Results]The analysis of genomics data mainly includes that of genomic,transcriptomic and epigenomic data.At present,the analysis of genomics data faces challenges primarily because the data are massive,multidimensional and heterogeneous.This review will elaborate on the current status,applications,challenges,and prospects of algorithm and tool development for genomics data analysis.[Conclusions]The future directions of algorithm and tool development for genomics data analysis are to make full use of advanced technologies such as artificial intelligence,statistical models,and knowledge graphs,and to continuously optimize and develop more advanced algorithms and robust models that are of error tolerance,high accuracy,and high efficiency with low cost of computing resources.
作者
陈梅丽
马英克
李茹姣
鲍一明
Chen Meili;Ma Yingke;Li Rujiao;Bao Yiming(National Genomics Data Center&CAS Key Laboratory of Genome Sciences and Information,Beijing Institute of Genomics(China National Center for Bioinformation),Chinese Academy of Sciences,Beijing 100101,China;School of Future Technology,University of Chinese Academy of Sciences,Beijing 100049,China)
出处
《数据与计算发展前沿》
2020年第2期1-19,共19页
Frontiers of Data & Computing
基金
国家重点研发计划“国际生命组学数据共享计划”(2016YFE0206600)
国家重点研发计划“疾病组学数据兼容与整合”(2017YFC0908403)
中国科学院战略性先导科技专项(B类)“多维大数据驱动的中国人群精准健康研究”(XDB38000000)
中国科学院信息化专项“大数据驱动的生物信息领域创新示范平台”(XXH13505-05)
中国科学院率先行动“百人计划”。
关键词
基因组
转录组
表观组
大数据分析
多源异构数据整合
genome
transcriptome
epigenome
big data analysis
multi-source heterogeneous data integration