期刊文献+

基因组二代测序数据的自动化分析流程 被引量:9

Automatic analysis pipeline of next-generation sequencing data
下载PDF
导出
摘要 二代测序技术的发展对测序数据的处理分析提出了很高的要求。目前二代测序数据分析软件很多,但是绝大多数软件仅能完成单一的分析功能(例如:仅进行序列比对或变异读取或功能注释等),如何能正确高效地选择整合这些软件已成为迫切需求。文章设计了一套基于perl语言和SGE资源管理的自动化处理流程来分析Illumina平台基因组测序数据。该流程以测序原始序列数据作为输入,调用业界标准的数据处理软件(如:BWA,Samtools,GATK,ANNOVAR等),最终生成带有相应功能注释、便于研究者进一步分析的变异位点列表。该流程通过自动化并行脚本控制流程的高效运行,一站式输出分析结果和报告,简化了数据分析过程中的人工操作,大大提高了运行效率。用户只需填写配置文件或使用图形界面输入即可完成全部操作。该工作为广大研究者分析二代测序数据提供了便利的途径。 The development of next-generation sequencing has generated high demand for data processing and analysis. Although there are a lot of software for analyzing next-generation sequencing data, most of them are designed for one specific function (e.g., alignment, variant calling or annotation). Therefore, it is necessary to combine them together for data analysis and to generate interpretable results for biologists. This study designed a pipeline to process Illumina sequencing data based on Perl programming language and SGE system. The pipeline takes original sequence data (fastq format) as input, calls the standard data processing software (e.g., BWA, Samtools, GATK, and Annovar), and finally outputs a list of annotated va-riants that researchers can further analyze. The pipeline simplifies the manual operation and improves the efficiency by automatization and parallel computation. Users can easily run the pipeline by editing the configuration file or clicking the graphical interface. Our work will facilitate the research projects using the sequencing technology.
出处 《遗传》 CAS CSCD 北大核心 2014年第6期618-624,共7页 Hereditas(Beijing)
基金 国家重点基础研究发展计划(973计划)项目(编号:2010CB529505) 中央高校基本科研业务费专项资金(编号:2012-XHGX02)资助
关键词 二代测序 自动化数据分析 流程 变异检测 next generation sequencing automatic data analysis pipeline variantion detection
  • 相关文献

参考文献9

  • 1Illumina Inc. Illumina Sequencing Technology. http://www illumina.com/documents/products/techspotlights/techspotl ight_sequencing.pdf.
  • 2Cock PJA, Fields C J, Goto N, Heuer ML, Rice PM. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res, 2010, 38(6): 1767-1771.
  • 3Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler Transform. Bioinformatics, 2010, 26(5): 589-595.
  • 4Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, HomerN, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup. The sequence alignment/map (SAM) format and SAMtools. Bioinformatics, 2009, 25(16): 2078-2079.
  • 5Picard. http://picard.sourceforge.net.
  • 6McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res, 2010, 20(9): 1297-I 303.
  • 7Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput se- quencing data. Nucleic Acids Res, 2010, 38(16): e 164.
  • 8Scbwartz RL, Pboenix T, Foy BD著.盛春,蒋永清,王晖译.Perl语言入门(第五版).南京:东南大学出版社,2009,200.
  • 9ORACLE INC. N1 Grid Engine 6 用户指南. http://docs.oracle.com/cd/E 19080-01/n I .grid.eng6/817- 7681/esqcr/index.html.

同被引文献152

引证文献9

二级引证文献33

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部