期刊文献+

基于高通量测序的短序列生物数据压缩研究 被引量:2

RESEARCH ON DATA COMPRESSION OF SHORT-SEQUENCE BIOLOGICAL DATA BASED ON NEXT-GENERATION SEQUENCING
下载PDF
导出
摘要 高通量测序技术(NGS)的发展带来了测序数据量的极速增长,给数据的存储和传输带来了极大的压力。数据压缩技术是解决这个问题的重要方法。传统的压缩方法并没有很好地利用数据本身的特性。因此,计算机学者们关注于NGS测序数据专用的压缩方法。全面总结针对高通量测序技术产生的Fastq和Fasta数据的压缩算法,介绍了Fastq和Fasta数据的特点,总结了目前常用的压缩方法。并通过不同物种、不同测序平台、不同规模的测序数据对多个具有代表性的压缩工具进行测试,比较它们的压缩性能并且验证相应的工具特点,为研究人员提供工具选择指导或改善工具性能提供帮助。最后总结阐述短序列数据压缩工具存在的问题和发展趋势。 Due to the development of next-generation sequencing technology (NGS), the rapid growth of sequential data has brought a heavy pressure to data storage and transmission. The data compression technique is an important method to solve this problem, but traditional compression methods do not exploit the characteristics of the data well. Therefore, scholars begin to focus on the compression algorithm which is the special one for NGS data. In this paper, we present a comprehensive summary of compression algorithms for the Fastq and Fasta data obtained from NGS. We introduce the features of Fastq and Fasta, and summarize the commonly used methods of sequential data compression. Then we evaluate these representative compression tools through tests on several data sets from various scales, species and sequencing platforms, in order to compare the compression performance and validate the characteristics so that they can support researchers as a guide for algorithm selection and improvement. Finally, some problems and the trends of short-sequence data compression algorithms are also proposed in this paper.
作者 孟倩
出处 《计算机应用与软件》 2017年第4期22-27,98,共7页 Computer Applications and Software
关键词 数据压缩 短序列数据压缩 高通量测序 Data compression Short-sequence data compression Next-generation sequencing
  • 相关文献

同被引文献8

引证文献2

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部