期刊文献+

DNA序列中基于后继数组索引的SATR查找算法 被引量:2

SUA-Based Algorithm for Finding SATRs in DNA Sequence
下载PDF
导出
摘要 研究了基因序列分析中的DNA序列相似性重复片段的查找问题.在对重复片段的相似性衡量进行分析之后,基于海明距离提出了新的相似度衡量标准模式相似度和片段相似度,并在此基础上提出了一个新的相似性重复片段的定义SATR(segment-similarity based approximate tandem repeats).在进行SATR的查找时,采用了一个轻量级的索引后继数组,并设计出在后继数组上进行SATR查找的算法.实验评估和性能分析表明,基于后继数组的SATR查找算法在查找结果和查找时间上都要优于其他同类方法. Studies finding approximate repetitions in DNA sequence, which is an important problem in gene analysis. Analyzing the approximate repetitions and similarity measurements and based on Hamming Distance, two definitions of pattern-similarity and segment-similarity are proposed as new measurements of similarity, then on the basis of the two definitions, a new concept of approximate repetition, i.e., the segment-similarity based approximate tandem repeats (SATR) is given. In addition, the succeeding unit array (SUA) as a lightweight index is introduced in finding SATRs in DNA sequence with an algorithm designed to find SATRs based on the index. Theoretical analysis and experiment results both show that the SATR finding algorithm based on SUA is superior to other methods in finding results and time saving.
出处 《东北大学学报(自然科学版)》 EI CAS CSCD 北大核心 2007年第2期184-188,共5页 Journal of Northeastern University(Natural Science)
基金 国家自然科学基金资助项目(6027307960473074)
关键词 DNA序列 相似性重复片段 片段相似度 SATR 后继数组 DNA sequence approximate repetitions segment-similarity SATR succeeding unit array (SUA)
  • 相关文献

参考文献1

二级参考文献4

  • 1[1]International human genome sequencing consortium.Initial sequencing and analysis of the human genome[J].Nature,2001,409(15):860-921
  • 2[2]Nag D K,Suri M,Stenson E K.Both CAG repeats and inverted DNA repeats stimulate spontaneous unequal sister-chromatid exchange in saccharomyces cerevisiae [J].Nucleic Acids Res,2004,32(18):5 677-5 684
  • 3[3]Stoye J,Gusfield D.Simple and flexible detection of contiguous repeats using a suffix tree[A].9th Annual Symposim,Combinatorial Pattern Matching [C],1998.140-152
  • 4[4]Kurtz S,Choudhuri J V,Ohlebusch E,et al.REPuter:the manifold applications of repeat analysis on a genomic scale[J].Nucl Acids Res,2001,29(22):4 633-4 642

同被引文献11

  • 1熊赟,陈越,朱扬勇.DnaReSM:一个基于多支持度的DNA重复序列挖掘算法[J].计算机科学,2007,34(2):211-212. 被引量:4
  • 2Chaudhuri P, Das S. Statistical analysis of large DNA sequences using distribution of DNA words [ J ]. Current Science, 2001,80 (9) : 1161 -1166.
  • 3Chaudhuri P, Das S. SWORDS: A statistical tool for analyzing large DNA sequences[ J ]. Journal of Biosciences, 2002,27 ( 1 ) : 1 -6.
  • 4Yang J, Wang W. CLUSEQ: Efficient and effective sequence clustering[ A]. In : Dayal U, Ramamritham K, Vijayaraman TM, eds. Proc. of the 19th Intl Conf. on Data Engineering[ C]. Ban- galore: IEEE Computer Society, 2003. 101 - 112.
  • 5Ester M, Zhang X. A top - down method for mining most specific frequent patterns in biological sequence data[ A]. In: Berry MW,Dayal U, Kamath C, Skillicorn DB, eds. Proc. of the 4th SIAM Intl Conf. on Data Mining[ C]. 2004. 90- 101.
  • 6G. M. Ladudau, J. P. Schmidt, D. Sokol. An Algorithm for Ap- proximate Tandem Repeats [ J ]. Journal of Computer Biology, 2001,8(1):1-18.
  • 7Kurtz S, Choudhml JV, Ohlebusch E, Schleiermacher C, Stoye J, Giegerich R. REPuter: the manifold applications of repeat a- nalysis on a genomic scale [ J]. Nucleic Acids Research, 2001 Nov,29(22) :4633 -42.
  • 8Y. Wexler, Z. Yakhini, Y. Kashi, D. Geiger. Finding Approxi- mate. Tandem Repeats in Genomic Sequences [ J ]. Journal of Computation Biology ,2005,12 ( 7 ) :928 - 942.
  • 9Yajun Jiang, Zhenlun Yang,Zengrong Zhan. A New Method for Finding Approximate Repetitions in DNA Sequences [ A ]. 2010 2rid International Conference on Signal Processing Systems [ C ], 2010, (2) :803 - 809.
  • 10Qingshan Jiang, Sheng Li, Shun Guo, Dan Wei. A New Model for Finding Approximate Tandem Repeats in DNA Sequences [ J ]. Journal Of Software,2011,6 ( 3 ) : 386 - 394.

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部