期刊文献+

BIGpre: A Quality Assessment Package for Next-Generation Sequencing Data 被引量:4

BIGpre: A Quality Assessment Package for Next-Generation Sequencing Data
原文传递
导出
摘要 The emergence of next-generation sequencing (NGS) technologies has significantly improved sequencing throughput and reduced costs. However, the short read length, duplicate reads and massive volume of data make the data processing much more difficult and complicated than the first-generation sequencing technology. Al- though there are some software packages developed to assess the data quality, those packages either are not easily available to users or require bioinformatics skills and computer resources. Moreover, almost all the quality assessment software currently available didn't taken into account the sequencing errors when dealing with the du- plicate assessment in NGS data. Here, we present a new user-friendly quality assessment software package called BIGpre, which works for both Illumina and 454 platforms. BIGpre contains all the functions of other quality assessment software, such as the correlation between forward and reverse reads, read GC-content distribution, and base Ns quality. More importantly, BIGpre incorporates associated programs to detect and remove duplicate reads after taking sequencing errors into account and trimming low quality reads from raw data as well. BIGpre is primarily written in Perl and integrates graphical capability from the statistics package R. This package produces both tabular and graphical summaries of data quality for sequencing datasets from Illumina and 454 platforms. Processing hundreds of millions reads within minutes, this package provides immediate diagnostic information for user to manipulate sequencing data for downstream analyses. BIGpre is freely available at http://bigpre.sourceforge.net/. The emergence of next-generation sequencing (NGS) technologies has significantly improved sequencing throughput and reduced costs. However, the short read length, duplicate reads and massive volume of data make the data processing much more difficult and complicated than the first-generation sequencing technology. Al- though there are some software packages developed to assess the data quality, those packages either are not easily available to users or require bioinformatics skills and computer resources. Moreover, almost all the quality assessment software currently available didn't taken into account the sequencing errors when dealing with the du- plicate assessment in NGS data. Here, we present a new user-friendly quality assessment software package called BIGpre, which works for both Illumina and 454 platforms. BIGpre contains all the functions of other quality assessment software, such as the correlation between forward and reverse reads, read GC-content distribution, and base Ns quality. More importantly, BIGpre incorporates associated programs to detect and remove duplicate reads after taking sequencing errors into account and trimming low quality reads from raw data as well. BIGpre is primarily written in Perl and integrates graphical capability from the statistics package R. This package produces both tabular and graphical summaries of data quality for sequencing datasets from Illumina and 454 platforms. Processing hundreds of millions reads within minutes, this package provides immediate diagnostic information for user to manipulate sequencing data for downstream analyses. BIGpre is freely available at http://bigpre.sourceforge.net/.
出处 《Genomics, Proteomics & Bioinformatics》 SCIE CAS CSCD 2011年第6期238-244,共7页 基因组蛋白质组与生物信息学报(英文版)
基金 supported by the National Natural Science Foundation of China (Grant No.31000561 and 30900825) the Knowledge Innovation Program of the Chinese Academy of Sciences (Grant No.KSCX2-EW-R-01-04)
关键词 next-generation sequencing quality assessment duplicate reads sequencing error next-generation sequencing, quality assessment, duplicate reads, sequencing error
  • 相关文献

参考文献17

  • 1Metzker, M.L. 2010. Sequencing technologies - the next generation. Nat Rev. Genet. 11: 31-46.
  • 2Ng, P.C. and Kirkness, E.F. 2010. Whole genome sequencing. Methods Mol. Biol. 628: 215-226.
  • 3Schuster, S.C. 2008. Next-generation sequencing transforms today's biology. Nat. Methods 5: 16-18.
  • 4Tucker, T., et aL 2009. Massively parallel sequencing: the next big thing in genetic medicine. Am. J. Hum. Genet. 85: 142-154.
  • 5Schadt, E.E., et al. 2010. A window into third-generation sequencing. Hum. Mol. Genet. 19: R227-240.
  • 6Bateman, A. and Quackenbush, J. 2009. Bioinformatics for next generation sequencing. Bioinformatics 25: 429.
  • 7Dolan, P.C. and Denver, D.R. 2008. TileQC: a system for tile-based quality control of Solexa data. BMC Bioinformatics 9: 250.
  • 8Cox, M.P., et al. 2010. SolexaQA: At-a-glance quality assessment of lllumina second-generation sequencing data. BMC Bioinformatics 11: 485.
  • 9Martinez-Alcantara, A., et al. 2009. PIQA: pipeline for IUumina G1 genome analyzer data quality assessment. Bioinformatics 25: 2438-2439.
  • 10Kozarewa, I., et al. 2009. Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (G+C)-biased genomes. Nat. Methods 6: 291-295.

同被引文献23

引证文献4

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部