DNA序列中基于后继数组索引的SATR查找算法被引量：2

SUA-Based Algorithm for Finding SATRs in DNA Sequence

下载PDF

导出

摘要研究了基因序列分析中的DNA序列相似性重复片段的查找问题.在对重复片段的相似性衡量进行分析之后,基于海明距离提出了新的相似度衡量标准模式相似度和片段相似度,并在此基础上提出了一个新的相似性重复片段的定义SATR(segment-similarity based approximate tandem repeats).在进行SATR的查找时,采用了一个轻量级的索引后继数组,并设计出在后继数组上进行SATR查找的算法.实验评估和性能分析表明,基于后继数组的SATR查找算法在查找结果和查找时间上都要优于其他同类方法. Studies finding approximate repetitions in DNA sequence, which is an important problem in gene analysis. Analyzing the approximate repetitions and similarity measurements and based on Hamming Distance, two definitions of pattern-similarity and segment-similarity are proposed as new measurements of similarity, then on the basis of the two definitions, a new concept of approximate repetition, i.e., the segment-similarity based approximate tandem repeats （SATR） is given. In addition, the succeeding unit array （SUA） as a lightweight index is introduced in finding SATRs in DNA sequence with an algorithm designed to find SATRs based on the index. Theoretical analysis and experiment results both show that the SATR finding algorithm based on SUA is superior to other methods in finding results and time saving.

作者王镝赵毅陈白尘王国仁

机构地区东北大学信息科学与工程学院

出处《东北大学学报（自然科学版）》 EI CAS CSCD 北大核心 2007年第2期184-188,共5页 Journal of Northeastern University(Natural Science)

基金国家自然科学基金资助项目(6027307960473074)

关键词 DNA序列相似性重复片段片段相似度 SATR 后继数组 DNA sequence approximate repetitions segment-similarity SATR succeeding unit array （SUA）

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献1

1王镝,王国仁,陈白尘,吴青泉,王斌,韩冬红.一种可用于生物序列分析的轻量级索引结构[J].华中科技大学学报（自然科学版）,2005,33(z1):209-212. 被引量：1

二级参考文献4

1[1]International human genome sequencing consortium.Initial sequencing and analysis of the human genome[J].Nature,2001,409(15):860-921
2[2]Nag D K,Suri M,Stenson E K.Both CAG repeats and inverted DNA repeats stimulate spontaneous unequal sister-chromatid exchange in saccharomyces cerevisiae [J].Nucleic Acids Res,2004,32(18):5 677-5 684
3[3]Stoye J,Gusfield D.Simple and flexible detection of contiguous repeats using a suffix tree[A].9th Annual Symposim,Combinatorial Pattern Matching [C],1998.140-152
4[4]Kurtz S,Choudhuri J V,Ohlebusch E,et al.REPuter:the manifold applications of repeat analysis on a genomic scale[J].Nucl Acids Res,2001,29(22):4 633-4 642

同被引文献11

1熊赟,陈越,朱扬勇.DnaReSM:一个基于多支持度的DNA重复序列挖掘算法[J].计算机科学,2007,34(2):211-212. 被引量：4
2Chaudhuri P, Das S. Statistical analysis of large DNA sequences using distribution of DNA words [ J ]. Current Science, 2001,80 (9) : 1161 -1166.
3Chaudhuri P, Das S. SWORDS: A statistical tool for analyzing large DNA sequences[ J ]. Journal of Biosciences, 2002,27 ( 1 ) : 1 -6.
4Yang J, Wang W. CLUSEQ: Efficient and effective sequence clustering[ A]. In : Dayal U, Ramamritham K, Vijayaraman TM, eds. Proc. of the 19th Intl Conf. on Data Engineering[ C]. Ban- galore: IEEE Computer Society, 2003. 101 - 112.
5Ester M, Zhang X. A top - down method for mining most specific frequent patterns in biological sequence data[ A]. In: Berry MW,Dayal U, Kamath C, Skillicorn DB, eds. Proc. of the 4th SIAM Intl Conf. on Data Mining[ C]. 2004. 90- 101.
6G. M. Ladudau, J. P. Schmidt, D. Sokol. An Algorithm for Ap- proximate Tandem Repeats [ J ]. Journal of Computer Biology, 2001,8(1):1-18.
7Kurtz S, Choudhml JV, Ohlebusch E, Schleiermacher C, Stoye J, Giegerich R. REPuter: the manifold applications of repeat a- nalysis on a genomic scale [ J]. Nucleic Acids Research, 2001 Nov,29(22) :4633 -42.
8Y. Wexler, Z. Yakhini, Y. Kashi, D. Geiger. Finding Approxi- mate. Tandem Repeats in Genomic Sequences [ J ]. Journal of Computation Biology ,2005,12 ( 7 ) :928 - 942.
9Yajun Jiang, Zhenlun Yang,Zengrong Zhan. A New Method for Finding Approximate Repetitions in DNA Sequences [ A ]. 2010 2rid International Conference on Signal Processing Systems [ C ], 2010, (2) :803 - 809.
10Qingshan Jiang, Sheng Li, Shun Guo, Dan Wei. A New Model for Finding Approximate Tandem Repeats in DNA Sequences [ J ]. Journal Of Software,2011,6 ( 3 ) : 386 - 394.

引证文献2

1姜华,孟志青,周克江.DNA序列频繁近似模式挖掘[J].生物信息学,2013,11(1):11-15. 被引量：1
2张帆,谢宇奇,饶晨,王明春.基于替换错误的相似片段查找[J].计算机科学与应用,2020,10(5):971-977.

二级引证文献1

1杨静欣,毛国君.一种基于位置信息的高效DNA序列挖掘算法[J].计算机应用与软件,2017,34(6):230-235. 被引量：1

1吴青泉,王国仁,王镝,胡大斌,汪恒杰,郭烨,朱铭杰.基于PFD过滤器查找DNA序列中相似性重复片段[J].计算机研究与发展,2007,44(z3):521-528. 被引量：1
2SATR硬盘与Pqmogic不兼容[J].电脑爱好者,2010(22):63-63.
3王镝,王国仁,吴青泉,陈白尘,赵毅,毛克明.DNA序列中基于后继数组索引的LPR查找算法[J].计算机研究与发展,2006,43(z3):195-199. 被引量：4
4王镝,王国仁,陈白尘,吴青泉,王斌,韩冬红.一种可用于生物序列分析的轻量级索引结构[J].华中科技大学学报（自然科学版）,2005,33(z1):209-212. 被引量：1
5硬盘还敢再快点不?[J].电脑爱好者,2010(2):95-95.
6郑华利,周献中,王建宇.FCM图像分割算法的特征分析与改进[J].计算机工程,2004,30(5):17-18. 被引量：11
7聂文琪.面向中文的全文索引模型的比较[J].武汉交通职业学院学报,2007,9(3):76-80.
8西贝.网页搜索2窍门[J].电脑知识与技术（过刊）,2005(1):89-89.
9管希萌.指令系统对数组结构的优化支持[J].扬州教育学院学报,2000,18(3):27-30.
10李楠,杨卫东,方非.一种新的基于路径的XML模式聚类方法(英文)[J].计算机研究与发展,2011,48(S3):318-325. 被引量：1

东北大学学报（自然科学版）

2007年第2期

浏览历史

内容加载中请稍等...

DNA序列中基于后继数组索引的SATR查找算法被引量：2

参考文献1

二级参考文献4

同被引文献11

引证文献2

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

DNA序列中基于后继数组索引的SATR查找算法 被引量：2

参考文献1

二级参考文献4

同被引文献11

引证文献2

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

DNA序列中基于后继数组索引的SATR查找算法被引量：2