摘要
研究了基因序列分析中的DNA序列相似性重复片段的查找问题.在对重复片段的相似性衡量进行分析之后,基于海明距离提出了新的相似度衡量标准模式相似度和片段相似度,并在此基础上提出了一个新的相似性重复片段的定义SATR(segment-similarity based approximate tandem repeats).在进行SATR的查找时,采用了一个轻量级的索引后继数组,并设计出在后继数组上进行SATR查找的算法.实验评估和性能分析表明,基于后继数组的SATR查找算法在查找结果和查找时间上都要优于其他同类方法.
Studies finding approximate repetitions in DNA sequence, which is an important problem in gene analysis. Analyzing the approximate repetitions and similarity measurements and based on Hamming Distance, two definitions of pattern-similarity and segment-similarity are proposed as new measurements of similarity, then on the basis of the two definitions, a new concept of approximate repetition, i.e., the segment-similarity based approximate tandem repeats (SATR) is given. In addition, the succeeding unit array (SUA) as a lightweight index is introduced in finding SATRs in DNA sequence with an algorithm designed to find SATRs based on the index. Theoretical analysis and experiment results both show that the SATR finding algorithm based on SUA is superior to other methods in finding results and time saving.
出处
《东北大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2007年第2期184-188,共5页
Journal of Northeastern University(Natural Science)
基金
国家自然科学基金资助项目(6027307960473074)