期刊文献+

一种有效的重复序列识别算法 被引量:2

An effective algorithm for repeat sequence finding
下载PDF
导出
摘要 重复序列的分析是基因组研究中的一个重要课题,进行这一研究的基础则是从基因组序列中快速有效地找出其中的重复序列。一种投影拼接算法,即利用随机投影获得候选片断集合,利用片断拼接对候选片断进行拼接,以发现基因组中的重复序列。分析了算法的计算复杂度,构造了半仿真测试数据,对算法的测试结果表明了其有效性。 Analysis of repeat sequences is an important subject for genomie research. To do this analysis, we have to find all unknown repeat sequences from the whole sequence first. In this paper, we propose a novel projection-assemble algorithm to find these repeats. The algorithm employs random projection algorithm to obtain a candidature segment set, and employs exhaust search algorithm searching each pairs of segmeres to find potential linkage between them and then assemble them together. The complexity of our projection - assemble algorithm is nearly linear to the length of genomie sequence, and its memory usage is an exponential function to a parameter, which is relative to the length of genomic sequence. But that isn' t a serious problem, because to sequences with length up to several decades of millions, this parameter can be set as a constant. We construct a test dataset to examine our algorithm, and the results show that it can find the repeat segnents effectively.
出处 《生物信息学》 2005年第4期163-166,174,共5页 Chinese Journal of Bioinformatics
关键词 重复序列 随机投影 拼接 repeat sequences random projection assemble
  • 相关文献

参考文献9

  • 1[1]E.S. Lander, L.M. Linton, B. Birren, C. Nusbaum, M.C. Zody,J. Baldwin, K. Devon, and K. Dewar, et. al. Initial Sequencing and Analysis of the Human Genome[J]. Nature, 2001, 409 : 860 -921.
  • 2[2]G. Benson. Tandem Repeats Finder: a program to analyze DNA sequences[J]. Nucleic Acids Res., 1999,27: 573 - 580.
  • 3[3]Sonnhanner, E. L. L.,Durbin, R. A dot- matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysis[J]. Gene, 1995,167:1 - 10.
  • 4[4]Stefan Kurtz, Jomuna V. Choudhuri, Enno Ohlebusch, Chris Schleiermacher, Jens Stoye and Robert Giegerich, REPuter: the manifold applications of repeat analysis on a genomic scale[ J]. Nucleic Acids Research, 2001,29(22) :4633 - 4642.
  • 5[5]Rigoutsos, I.,Floratos, A. Motif discovery without alignment orenumeration[C]. In Proceedings of the second annual intemational conference on Computational molecular biology, (RECOMB), 1998,221 -227, New York.
  • 6[6]Hertz, G. ,Stormo, G. Identifying DNA and protein patterns with statistically significant alignments of multiple sequences[J]. Bioinformatics, 1999, 15:563-577.
  • 7[7]Pevzner, P. ,Sze,S. Combinatorial approaches to finding subtle signals in DNA sequences[C]. In Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology. AAAI Press, San Diego, 2000,269 - 278.
  • 8[8]Buhler, J. Search Algorithms for Biosequences Using Random Projection[D]. Ph.D. thesis. University of Washington,2001.
  • 9[9]Myers, E. W. A fast Bit - vector algorithm for approximate string matching based on dynamic programming[C]. In Ninth Combinatorial Pattern Matching Conference. Piscataway, NJ,1998,1 - 13.

同被引文献13

引证文献2

二级引证文献38

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部