期刊文献+

基因组学数据分析方法现状和展望 被引量:2

Current Status and Prospects of Genomics Data Analysis Methods
下载PDF
导出
摘要 【目的】全面阐述基因组学数据分析方法的现状和未来发展趋势,为精准医学、精准育种、生物安全、生物多样性、分子进化等的相关组学数据分析算法的研究与工具开发提供参考。【结果】基因组学数据分析主要包括基因组、转录组、表观组数据分析,当前基因组学数据主要面临着海量、多维、异构等挑战。本文详细地阐述了基因组学数据分析算法和工具开发的现状、应用、存在的问题和面临的挑战。【结论】充分利用人工智能、统计模型、知识图谱等先进技术,不断地优化和开发更先进的算法和更鲁棒的模型,使其兼具高容错、高准确、高效、计算资源低耗等优点,匹配海量、多维、异构基因组学大数据分析的需求,是未来基因组学数据分析算法和工具开发的方向。 [Objective]Through a comprehensive review of the current status and future development of genomics data analysis methods,we provide suggestions for the improvement of algorithm and tool development of related omics data analysis in precision medicine,precision breeding,biosafety,biodiversity and molecular evolution.[Results]The analysis of genomics data mainly includes that of genomic,transcriptomic and epigenomic data.At present,the analysis of genomics data faces challenges primarily because the data are massive,multidimensional and heterogeneous.This review will elaborate on the current status,applications,challenges,and prospects of algorithm and tool development for genomics data analysis.[Conclusions]The future directions of algorithm and tool development for genomics data analysis are to make full use of advanced technologies such as artificial intelligence,statistical models,and knowledge graphs,and to continuously optimize and develop more advanced algorithms and robust models that are of error tolerance,high accuracy,and high efficiency with low cost of computing resources.
作者 陈梅丽 马英克 李茹姣 鲍一明 Chen Meili;Ma Yingke;Li Rujiao;Bao Yiming(National Genomics Data Center&CAS Key Laboratory of Genome Sciences and Information,Beijing Institute of Genomics(China National Center for Bioinformation),Chinese Academy of Sciences,Beijing 100101,China;School of Future Technology,University of Chinese Academy of Sciences,Beijing 100049,China)
出处 《数据与计算发展前沿》 2020年第2期1-19,共19页 Frontiers of Data & Computing
基金 国家重点研发计划“国际生命组学数据共享计划”(2016YFE0206600) 国家重点研发计划“疾病组学数据兼容与整合”(2017YFC0908403) 中国科学院战略性先导科技专项(B类)“多维大数据驱动的中国人群精准健康研究”(XDB38000000) 中国科学院信息化专项“大数据驱动的生物信息领域创新示范平台”(XXH13505-05) 中国科学院率先行动“百人计划”。
关键词 基因组 转录组 表观组 大数据分析 多源异构数据整合 genome transcriptome epigenome big data analysis multi-source heterogeneous data integration
  • 相关文献

参考文献6

二级参考文献87

  • 1Schadt EE, Turner S, Kasarskis A. A window into third- generation sequencing. Hum Mol Genet 2010;19:R227-40.
  • 2Travers K, Chin CS, Rank D, Eid J, Turner S. A flexible and efficient template format for circular consensus sequencing and SNP detection. Nucleic Acids Res 2010;38:e159.
  • 3Pacific Biosciences. Media Kit, < http://www.pacb.com/company/news- events/media-resources/page/3/> (May 19, 2015, date last accessed).
  • 4Eid J, Fehr A, Gray J, Luong K, Lyle J, Otto G, et al. Real-time DNA sequencing from single polymerase molecules. Science 2009;323:133-8.
  • 5AllSeq. Pacific Biosciences, <http://allseq.com/knowledgebank/ sequencing-platforms/pacific-biosciences> (April 14, 2015, date last accessed).
  • 6Koren S, Phillippy AM. One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly, Curr Opin Microbiol 2015;23:110-20.
  • 7Brown S, Nagaraju S, Utturkar S, De Tissera S, Segovia S, Mitchell W, et al. Comparison of single-molecule sequencing and hybrid approaches for fnishing the genome of Clostridium autoethanogenum and analysis of CRISPR systems in industrial relevant Clostridia. Biotechnol Biofuels 2014;7:40.
  • 8Pacific Biosciences. SMRT sequencing: read lengths, <http:// www.pacb.com/smrt-science/smrt-sequencing/read-lengths/> (October 3, 2015, date last accessed).
  • 9Illumina. HiSeq 2500 specifications, < http://www.illumina.com/ systems/hiseq_2500_ 1500/performance_specifications.html > (April 14. 2015, date last accessed).
  • 10Myers G. PacBio AGBT 2015 live workshop, < http://blog.paci- ficbiosciences.com/2015/02/agbt-2015-1ive-streaming-pacbio-workshop. html > (October 10, 2015, date last accessed).

共引文献181

同被引文献14

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部