期刊文献+

基于统计分析与分段码书的DNA序列压缩新方法

A New Compression Scheme for DNA Sequences Based on Statistical Analysis and Segmented Codebook
下载PDF
导出
摘要 将DNA序列分成64个碱基一组的短序列。根据每个小段落不同的碱基排列特点,通过对每段中重复频率最高的三碱基组合片段采用特定码书编码,提出了基于统计分析与分段码书的DNA序列压缩方法,以达到对DNA数据压缩的目的。实验表明,本算法在大部分常用基准测试序列中达到了比较好的压缩性能。 DNA sequence is divided into short sequences with a length of 64 bases in every group. According to the different bases arrangement characteristics of each small paragraph, the specific nucleotides triplet is encoded which repeats the most times in a small paragraph with a particular codebook and a compression scheme for DNA data based on statistical analysis and segmented eodebook is put fonward. Thus achieve the purpose of DNA data compression. The experiments show that the proposed algorithm can achieve a good performance in compressing most of the common benchmark sequences.
出处 《科学技术与工程》 北大核心 2012年第29期7505-7509,7514,共6页 Science Technology and Engineering
关键词 DNA序列 统计分析 码书 分段编码 DNA sequences statistical analysis codebook segmented encoding
  • 相关文献

参考文献6

  • 1纪震,周家锐,姜来,Q.H.Wu.DNA序列数据压缩技术综述[J].电子学报,2010,38(5):1113-1121. 被引量:8
  • 2张丽霞,张义青,林丕源,刘吉平.基于字符和0/1码的DNA压缩模式匹配算法[J].计算机应用研究,2007,24(9):22-24. 被引量:3
  • 3Ferreira P J S G, Neves A J R, et al. Explorin three-base periodicity for DNA compression and modeling. Proceeding of the IEEE Confer- ence on Acoustics ,Speech and Signal Processing. Toulouse ,2006: 877-880.
  • 4Chen X, Kwong S, et al. A compression algorithm for DNA se- quences and its applications in genome comparison. Procceeding of the 10th Workshop on Genome Informatics. Tokyo: GIW, 1999:51 - 61.
  • 5Korodi G, Tabus I, et al. DNA sequence compression-based on the normalized maximum likelihood model IEEE Signal Processing Maga- zine, 2007 ; 24 ( 1 ) :47-53.
  • 6纪震,周家锐,朱泽轩,Q H Wu.基于生物信息学特征的DNA序列数据压缩算法[J].电子学报,2011,39(5):991-995. 被引量:8

二级参考文献56

  • 1林毅申,林丕源.基于Web Services的生物信息解决方案[J].计算机应用研究,2005,22(6):157-158. 被引量:4
  • 2邢仲璟,林丕源,林毅申.基于Bioperl的生物二次数据库建立及应用[J].计算机系统应用,2004,13(11):58-60. 被引量:7
  • 3王玉,饶妮妮,匡斌,袁祚涌.基于小波变换技术预测DNA序列的编码区[J].电子学报,2007,35(1):141-144. 被引量:6
  • 4林毅申,林丕源,彭宏.基于字典的DNA序列压缩算法研究及应用[J].计算机应用研究,2007,24(6):265-267. 被引量:4
  • 5Grumbach S, Tahi F. A new challenge for compression algorithms: Genetic sequences[ J]. Information Processing & Management, 1994,30(6) :875 - 886.
  • 6Chen X, Kwong S, et al. A compression algorithm for DNA sequences and its applications in genome comparison[A]. Proc of the 10th Workshop on Genome Informatics [ C ]. Tokyo: GIW, 1999.51 - 61.
  • 7Minh D C, Dix T I, et al. A simple statistical algorithm for biological sequence compression [ A ]. Proc of Data Compression Conference[C]. Snowbird: DCC, 2007.43 - 52.
  • 8Matsumoto T, Sadakane K, et al. Biological sequence compression algorithms [ A ]. Proc of Genome Informatics Workshop[C]. Tokyo: CIW,2000.43 - 52.
  • 9Sadakane K, Okazaki T, et al. Implementing the context tree weighting method for text compression [ A ]. Proc of Data Compression Conference [ C ]. Snowbird: DCC, 2000. 123 - 132.
  • 10Chen X, Li M, et al. DNA Compress: Fast and effective DNA sequence compression[ J ]. Bioinformatics, 2002, 18 (12) : 1696 - 1698.

共引文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部